R Style Guide

Why have a style guide?

A "coding style" is a set of guidelines for writing readable code. Style is those things that do not matter to the computer: the R interpreter does not care about white space, for example, but human readers do. Good style is important because while your code only has one author, it will usually have multiple readers. Style guidelines, also known as "coding conventions", are especially important when working on projects with multiple people. Collaboration is why R is so popular, so make your code easy to read!

Here are some guidelines that I think produce readable R code. These guidelines are based on Google's R style guide, Hadley Wickham's tweaks to them and my own R coding style. "Good style" is subjective. Some of the recommendations below are consensus "best practices" but others are simply arbitrary guidelines that provide consistency. Naming conventions, indentation depth, etc, are all arbitrary but it can be helpful to agree to some specific style when working on a team.

Syntax

Line length

Keep your lines less than 80 characters. Longer is too difficult to read.

Boolean values

The boolean literals TRUE and FALSE should be used for boolean values rather than the variable names T and F. This is specific to R in which the variables T and F are defined by default but can be changed.

# good:
c(TRUE, FALSE)

# bad:
c(T, F) # dangerous!  for example:
T <- FALSE

Spacing

Place spaces around all binary operators (==, +, -, <-, =, etc.). Do not place a space before a comma, but always place one after a comma. It is especially important to place space around <- to avoid ambiguity, eg y<-3 : test or assignment?

# Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
# Bad
average<-mean(feet/12+inches,na.rm=T)

It is ok, to break the rule above occasionally to group items. For example, I think it is easiest to read exponentiation without spaces and sometimes you can emphasize order of operations, e.g.: square <- x^2 + y*3. I also think it looks fine to have no space around the "=" sign in when passing parameters in a function call.

# Fine
average <- mean(feet / 12 + inches, na.rm=TRUE)

Place a space before left parentheses, except in a function call.

# Good
if (debug)
plot(x, y)

# Bad
if(debug)
plot (x, y)

Extra spacing (i.e., more than one space in a row) is okay if it improves alignment of equals signs or arrows (<-).

l <- list(total = a + b + c, 
          mean  = (a + b + c) / n
          name  = "abc")

Do not place spaces around code in parentheses or square brackets. (Except if there’s a trailing comma)

# Good
diamonds[5,1]
diamonds[5, ]

# Bad
if ( debug )
x[1,]  # Needs a space after the comma
x[1 ,] # Space goes after, not before

Curly braces

An opening curly brace should never go on its own line and should always be followed by a new line; a closing curly brace should always go on its own line, unless followed by else.

Always indent the code inside the curly braces.

There is a lot of variability in curly braces and indenting style.

# Good
if (y < 0 && debug) {
  message("Y is negative")
}

if (y == 0) {
  log(x)
} else {
  y ^ x
}

# Bad

if (y < 0 && debug)
message("Y is negative")

if (y == 0) {
  log(x)
} 
else {
  y ^ x
}

It is ok to leave very short statements on the same line:

if (y < 0 && debug) message("Y is negative")

Indentation

When indenting your code, use two (or 4) spaces per level. Do not use tabs or mix tabs and spaces. Exception: When a line break occurs inside parentheses, align the wrapped line with the first character inside the parenthesis. For example, if a function definition runs over multiple lines, indent the second line to where the definition starts:

long_function_name <- function(a = "a long argument", b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}

# alternative to reduce lines for long definitions:

long_function_name 
<- function(a = "a long argument", b = "another argument", 
            c = "another long argument", d = TRUE, E = FALSE) {
  # As usual code is indented by two spaces.
}

Assignment

I prefer you use <- rather than =, for assignment. This keeps assignment clear from argument passsing in a function (=) and from testing equality (==).

# Good
x <- 5
# Bad
x = 5

Semicolons

Do not terminate your lines with semicolons or use semicolons to put more than one command on the same line.

Notation and naming

File names

Script file names should end in .R and be meaningful. Use "-" or "_" to separate words in file names.

# Good
explore-bnames.R
schwilk-hw-1.R
# Bad
foo.txt
my-homework
homework.doc

Identifiers

Variable names should be lowercase. Use underscores "_" to separate words within a name. Generally, variable names should be nouns and function names should be verbs. Strive for concise but meaningful names (this is not easy!). If the variable has a unit, that unit should be included as a suffix in the name. Index variables with short scope may be given short names, e.g., i, j, k. Likewise, mathematical variables with short scope may be given an appropriate short name such as x, y, z.

# Good variables
day_one
day_1
n_days  # 'n' indicates a count
height_cm # it can be useful to indicate units

# Bad variables
first_day_of_the_month
DayOne
dayone
djm1

Don't litter your code with numeric literals. If you need to hard code numbers (e.i. include numeric constants), then place these near the top of your script. I prefer that such constants are UPPER_CASED

# Good use of constants 
EARTH_RADIUS_KM <- 6,371
# later in script
e <- estimateSpheroid(r=EARTH_RADIUS_KM*1000)


# Bad use of numeric literals
e <- estimateSpheroid(6371000)

Make function names verbs. I recommend using camelCase for functions to easily distinguish them from other objects, but this is very arbitrary.

# Good functions
countDays()
formatLine()

# bad
days()
fl()

Functions

Functions should have a single return function just before the final brace. There are times to break this rule, but try not to.

# good
isNegative <- function(x){
  if (x < 0) {
    is_neg <- TRUE
  } else {
    is_neg <- FALSE
  }
  return(is_neg)
}

# bad
isNegative <- function(x) {
  if (x < 0){
    return(TRUE)
  } else {
    return(FALSE)
  }
}

Script organization

Ordering

A suggested order of elements in an R script:

  • Copyright statement comment
  • Author comment
  • File description comment, including purpose of program, inputs, and outputs
  • source() and library() statements
  • Constant definitions if applicable
  • Function definitions
  • Executed statements, if applicable (e.g., print, plot)

Commenting guidelines

Comment your code. Entire commented lines should begin with # (or ##) and one space. Comments should explain the why, not the what. Rarely add a comment to the end of a line — usually you should put comments on their own lines.

Use commented lines of - or = to break up your files into scannable chunks.

Documenting functions using comments

It is important to document the arguments and the return value for a function. One place to do this is right above or at the beginning of a function. I recommend placing the "comment header" immediately before the function and following a consistent structure. Example:

# CalculateSampleCovariance:
#   Computes the sample covariance between two vectors.
#
# Args:
#   x: One of two vectors whose sample covariance is to be calculated.
#   y: The other vector. x and y must have the same length, greater than one,
#      with no missing values.
#   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
#
# Returns:
#   The sample covariance between x and y.
calculateSampleCovariance <- function(x, y, verbose = TRUE) {
  n <- length(x)
  # Error handling
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  }
  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
    stop(" Arguments x and y must not have missing values.")
  }
  covariance <- var(x, y)
  if (verbose)
    cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
  return(covariance)
}
Back to top | E-mail Schwilk