R Style Guide

Why have a style guide?

A "coding style" is a set of guidelines for writing readable code. Style is those things that do not matter to the computer: the R interpreter does not care about white space, for example, but human readers do. Good style is important because while your code only has one author, it will usually have multiple readers. Style guidelines, also known as "coding conventions", are especially important when working on projects with multiple people. Collaboration is why R is so popular, so make your code easy to read!

Here are some guidelines that I think produce readable R code. These guidelines are based on Google's R style guide, Hadley Wickham's tweaks to them, as well as my own R coding style. "Good style" is subjective. Some of the recommendations below are consensus "best practices," but others are simply arbitrary guidelines that provide consistency. Naming conventions, indentation depth, etc, are all arbitrary, but it can be helpful to agree to some specific style when working with a team.

Syntax

Line length

Keep your lines less than 80 characters. Longer is too difficult to read.

Boolean values

It is rare in code to use literal boolean (logical) values because we most often create these with logical operators, eg x > 10. However they do show up in function calls as arguments such as read.csv("fn.csv", stringsAsFactors=FALSE) The boolean literals TRUE and FALSE should be used for boolean values rather than the variable names T and F. This is specific to R in which the variables T and F are defined by default but can be changed.

# good:
c(TRUE, FALSE)

# bad:
c(T, F) # dangerous!  for example:
F <- TRUE
if(F) print("Unexpected!")

Spacing

Place spaces around all binary operators (==, +, -, <-, =, etc.). It is ok to omit the sapce before the exponentiation operator (^). Do not place a space before a comma, but always place one after a comma. It is especially important to place space around <- to avoid ambiguity, eg y<-3 : test or assignment?

# Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
# Bad
average<-mean(feet/12+inches,na.rm=T)

It is ok, to break the rule above to group items. For example, I think it is easiest to read exponentiation without spaces and sometimes you can emphasize order of operations by omitting the space around the multiplication operator, e.g.: square <- x^2 + y*3. I also think it looks fine to have no space around the "=" sign in when passing parameters in a function call.

# Fine
average <- mean(feet / 12 + inches, na.rm=TRUE)

Place a space before left parentheses, except in a function call.

# Good
if (debug)
plot(x, y)

# Bad
if(debug)
plot (x, y)

Extra spacing (i.e., more than one space in a row) is okay if it improves alignment of equals signs or arrows (<-).

l <- list(total = a + b + c, 
          mean  = (a + b + c) / n
          name  = "abc")

Do not place spaces around code in parentheses or square brackets. (Except if there’s a trailing comma to emphasize the missing index)

# Good
diamonds[5,1]
diamonds[5, ]

# Bad
if ( debug )
x[1,]  # Needs a space after the comma
x[1 ,] # Space goes after, not before

Curly braces

There is a lot of variability in curly braces and indenting style. Below is what I do:

An opening curly brace never goes on its own line and is always be followed by a new line; a closing curly brace is always go on its own line, unless followed by else.

Always indent the code inside the curly braces.

# Good
if (y < 0 && debug) {
  message("Y is negative")
}

if (y == 0) {
  log(x)
} else {
  y ^ x
}

# Bad

if (y < 0 && debug)
message("Y is negative")

if (y == 0) {
  log(x)
} 
else {
  y ^ x
}

It is ok to leave very short if statements on the same line:

if (y < 0 && debug) message("Y is negative")

Indentation

When indenting your code, use two (or 4) spaces per level. Most of my code uses 2 spaces. Do not use tabs or mix tabs and spaces. Exception: When a line break occurs inside parentheses, align the wrapped line with the first character inside the parenthesis. For example, if a function definition runs over multiple lines, indent the second line to where the definition starts:

long_function_name <- function(a = "a long argument", b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}

# alternative to reduce lines for long definitions:

long_function_name 
<- function(a = "a long argument", b = "another argument", 
            c = "another long argument", d = TRUE, E = FALSE) {
  # As usual code is indented by two spaces.
}

Assignment

I prefer you use <- rather than =, for assignment. This keeps assignment clear from argument passing in a function (=) and from testing equality (==).

# Good
x <- 5
# Bad
x = 5

Semicolons

Do not terminate your lines with semicolons or use semicolons to put more than one command on the same line.

Notation and naming

File names

Script file names should end in .R and be meaningful. Use "-" or "_" to separate words in file names.

# Good
read_licor_data.R
explore-bnames.R
Schwilk_Dylan-HW05.R
# Bad
foo.txt
my-homework
homework.doc

Identifiers

Variable names should be lowercase. Use underscores "_" to separate words within a name. Base R tends to use "." to separate words and that is ok, too. Generally, variable names should be nouns and function names should be verbs. I use separate naming conventions for functions vs other objects (see below). Strive for concise but meaningful names (this is not easy!). If the variable has a unit, that unit can be included as a suffix in the name. Index variables with short scope may be given short names, e.g., i, j, k. Likewise, mathematical variables with short scope may be given an appropriate short name such as x, y, z.

# Good variables
day_one
day_1
n_days  # 'n' indicates a count
height_cm # it can be useful to indicate units

# Bad variables
first_day_of_the_month
DayOne
dayone
djm1

Don't litter your code with numeric literals. If you need to hard code numbers (e.i. include numeric constants), then place these near the top of your script. In my own code, I prefer that such constants are UPPER_CASED

# Good use of constants 
EARTH_RADIUS_KM <- 6371
# later in script
e <- estimateSpheroid(r=EARTH_RADIUS_KM*1000)


# Bad use of numeric literals
e <- estimateSpheroid(6371000)

Use verbs for function names when possible. I recommend using camelCase for functions to easily distinguish them from other objects, but this is very arbitrary.

# Good functions
countDays()
formatLine()

# bad
days()
fl()

Functions

Functions should have a single return function just before the final brace. There are times to break this rule, but try not to.

# good
sillyFunction <- function(x){
  if (x < 0) {
    res <- "NEG"
  } else {
    res <- "POS"
  }
  if (x>100) res <- paste("BIG", res)
  return(res)
}

Script organization

Ordering

A suggested order of elements in an R script:

  • Copyright statement comment
  • Author comment
  • File description comment, including purpose of program, inputs, and outputs
  • source() and library() statements
  • Constant definitions if applicable
  • Function definitions
  • Executed statements, if applicable (e.g., print, plot)

Commenting guidelines

Comment your code. Comments should explain the why, not the what. Put a space after the comment character ("#"). When adding a comment to the end of a line, use a single '#' – however, rarely add a comment to the end of a line — usually you should put comments on their own lines. Wrap your comments at 79 characters (Ctrl + Shift + / in RStudio). I usually use two "#" characters for regular comments on their own lines – this helps comments stand out and distinguishes informative comments from some temporarily "commented out" code.

Use commented lines of - or = to break up your files into scannable chunks. See examples from my own code in the homework assignments.

Learn how to use your editor to wrap comments ("reflow comment", [Ctrl-Shift-/] in RStudio)

Documenting functions using comments

It is important to document the arguments and the return value for a function. One place to do this is right above or at the beginning of a function. I recommend placing the "comment header" immediately before the function and following a consistent structure. An example of one such format:

# CalculateSampleCovariance:
#   Computes the sample covariance between two vectors.
#
# Args:
#   x: One of two vectors whose sample covariance is to be calculated.
#   y: The other vector. x and y must have the same length, greater than one,
#      with no missing values.
#   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
#
# Returns:
#   The sample covariance between x and y.
calculateSampleCovariance <- function(x, y, verbose = TRUE) {
  n <- length(x)
  # Error handling
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  }
  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
    stop(" Arguments x and y must not have missing values.")
  }
  covariance <- var(x, y)
  if (verbose) cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
  return(covariance)
}
Back to top | E-mail Schwilk