R Style Guide

Why have a style guide?

A "coding style" is a set of guidelines for writing readable code. Style is those things that do not matter to the computer: the R interpreter does not care about white space, for example, but human readers do. Good style is important because while your code only has one author, it will usually have multiple readers. Style guidelines, also known as "coding conventions", are especially important when working on projects with multiple people. Collaboration is why R is so popular, so make your code easy to read!

That said, many style guidelines are arbitrary and "good style" is subjective. Here are some arbitrary guidelines that I think produce readable R code. These guidelines are based on Google's R style guide, Hadley Wickham's tweaks to them and my own R coding style.

Notation and naming

File names

Script file names should end in .R and be meaningful. Use "-" or "_" to separate words in file names.

# Good
explore-bnames.R
schwilk-hw-1.R
# Bad
foo.txt
my-homework
homework.doc

Identifiers

Variable names should be lowercase. Use underscores "_" to separate words within a name. Generally, variable names should be nouns and function names should be verbs. Strive for concise but meaningful names (this is not easy!). If the variable has a unit, that unit should be included as a suffix in the name. Index variables with short scope may be given short names, e.g., i, j, k. Likewise, mathematical variables with short scope may be given an appropriate short name such as x, y, z.

# Good variables
day_one
day_1
n_days  # 'n' indicates a count
height_cm # it can be useful to indicate units

# Bad variables
first_day_of_the_month
DayOne
dayone
djm1

Variables representing constants should be UPPER_CASED

EARTH_RADIUS_KM <- 6,371

Make function names verbs! I recommend using camelCase for functions to easily distinguish them from other objects.

# Good functions
countDays()
formatLine()

# bad
days()
fl()

Syntax

This is the important part of style.

Boolean values

TRUE and FALSE should be used for boolean values (T and F).

# good:
c(TRUE, FALSE)

# bad:
c(T, F) # dangerous!

Spacing

Place spaces around all binary operators (==, +, -, <-, =, etc.). Do not place a space before a comma, but always place one after a comma. It is especially important to place space around <- to avoid ambiguity, eg y<-3 : test or assignment?

# Good
average <- mean(feet / 12 + inches, na.rm = TRUE)
# Bad
average<-mean(feet/12+inches,na.rm=T)

It is ok, to break the rule above occasionally to group items. For example, I think it is easiest to read exponentiation without spaces and sometimes you can emphasize order of operations, e.g.: square <- x^2 + y*3. I also think it looks fine to have no space around the "=" sign in when passing parameters in a function call.

Place a space before left parentheses, except in a function call.

# Good
if (debug)
plot(x, y)

# Bad
if(debug)
plot (x, y)

Extra spacing (i.e., more than one space in a row) is okay if it improves alignment of equals signs or arrows (<-).

l <- list(total = a + b + c, 
          mean  = (a + b + c) / n
          name  = "abc")

Do not place spaces around code in parentheses or square brackets. (Except if there’s a trailing comma: always place a space after a comma, just like in ordinary English.)

# Good
diamonds[5,1]
diamonds[5, ]

# Bad
if ( debug )
x[1,]  # Needs a space after the comma
x[1 ,] # Space goes after, not before

Curly braces

An opening curly brace should never go on its own line and should always be followed by a new line; a closing curly brace should always go on its own line, unless followed by else.

Always indent the code inside the curly braces.

# Good
if (y < 0 && debug) {
  message("Y is negative")
}

if (y == 0) {
  log(x)
} else {
  y ^ x
}

# Bad

if (y < 0 && debug)
message("Y is negative")

if (y == 0) {
  log(x)
} 
else {
  y ^ x
}

It is ok to leave very short statements on the same line:

if (y < 0 && debug) message("Y is negative")

Indentation

When indenting your code, use two (or 4) spaces per level. Do not use tabs or mix tabs and spaces. Exception: When a line break occurs inside parentheses, align the wrapped line with the first character inside the parenthesis. For example, if a function definition runs over multiple lines, indent the second line to where the definition starts:

long_function_name <- function(a = "a long argument", b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}

# alternative to reduce lines for long definitions:

long_function_name 
<- function(a = "a long argument", b = "another argument", 
            c = "another long argument", d = TRUE, E = FALSE) {
  # As usual code is indented by two spaces.
}

Line length

Keep your lines less than 80 characters. Longer is too difficult to read.

Assignment

Use "<-", not "=", for assignment.

# Good
x <- 5
# Bad
x = 5

Semicolons

Do not terminate your lines with semicolons or use semicolons to put more than one command on the same line.

Functions

Functions must have a single return function just before the final brace. There are times to break this rule, but try not to.

# good
isNegative = function(x){
  if (x < 0) {
    is_neg = TRUE
  } else {
    is_neg = FALSE
  }
  return(is_neg)
}

# bad
isNegative = function(x) {
  if (x < 0){
    return(TRUE)
  } else {
    return(FALSE)
  }
}

Organization

Ordering

A suggested order of elements in an R script:

  • Copyright statement comment
  • Author comment
  • File description comment, including purpose of program, inputs, and outputs
  • source() and library() statements
  • Function definitions
  • Executed statements, if applicable (e.g., print, plot)

Commenting guidelines

Comment your code. Entire commented lines should begin with # (or ##) and one space. Comments should explain the why, not the what. Rarely add a comment to the end of a line — usually you should put comments on their own lines.

Use commented lines of - and = to break up your files into scannable chunks.

Documenting functions using comments

It is important to document the arguments and the return value for a function. One place to do this is right above or at the beginning of a function. I recommend placing the "comment header" immediately before the function and following a consistent structure. Example:

# CalculateSampleCovariance:
#   Computes the sample covariance between two vectors.
#
# Args:
#   x: One of two vectors whose sample covariance is to be calculated.
#   y: The other vector. x and y must have the same length, greater than one,
#      with no missing values.
#   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
#
# Returns:
#   The sample covariance between x and y.
calculateSampleCovariance <- function(x, y, verbose = TRUE) {
  n <- length(x)
  # Error handling
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  }
  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
    stop(" Arguments x and y must not have missing values.")
  }
  covariance <- var(x, y)
  if (verbose)
    cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
  return(covariance)
}

Writing documentation accessible with ? or help()

I won't ask you to do this in this course, but it may be useful to know how one writes documentation the "R way". The documentation accessible with the help functions is stored in .Rd files in the man/ directory. Although you can write these by hand, a better way is to use function comment headers with a specific strcture that the roxygen program can parse and use to automatically create the appropriate .Rd files. For more information on this advanced topic see http://r-pkgs.had.co.nz/man.html

Back to top | E-mail Schwilk