Chapter 4 Control Flow

4.1 Learning Objectives

  • Create for loops and if/else statements in R.
  • Explain why vectors cannot be used directly in conditional expressions and correctly use all and any to combine their values.
  • Define functions taking a fixed number of named arguments and/or a variable number of arguments.
  • Explain what vectorization is and create vectorized equivalents of unnested loops containing simple conditional tests.

Chapter 2 said that modern R strongly encourages people to write vectorized code. There are times, though, when we need to write loops and conditionals, and we should always break our code up into single-purpose functions.

4.2 How do I choose and repeat things?

We cherish the illusion of free will so much that we embed a pretense of it in our machines in the form of conditional statements using if and else. (Ironically, we then instruct those same machines to make the same decisions over and over. It’s no wonder they sometimes appear mad…) Here, for example, is a snippet of Python that displays the signs of a list of numbers:

values = [-15, 0, 15]
for v in values:
    if v < 0:
        sign = -1
    elif v == 0:
        sign = 0
    else:
        sign = 1
    print("The sign of", v, "is", sign)
The sign of -15 is -1
The sign of 0 is 0
The sign of 15 is 1

Its direct translation into R is:

values <- c(-1, 0, 1)
for (v in values) {
  if (v < 0) {
    sign <- -1
  }
  else if (v == 0) {
    sign <- 0
  }
  else {
    sign <- 1
  }
  print(glue::glue("The sign of {v} is {sign}"))
}
The sign of -1 is -1
The sign of 0 is 0
The sign of 1 is 1
print(glue::glue("The final value of v is {v}"))
The final value of v is 1

There are a few things to note here:

  1. This is not how we should write R: everything in this snippet can and should be vectorized.
  2. The parentheses in the loop header are required: we cannot simply write for v in values.
  3. The curly braces around the body of the loop and around the bodies of the conditional branches are optional, since each contains only a single statement. However, they should always be there to help readability.
  4. The loop variable v persists after the loop is over.
  5. glue::glue (the function glue from the library of the same name) interpolates variables into strings in sensible ways. We will load this library and use plain old glue in the explanations that follow. (Note that R uses :: to get functions out of packages rather than Python’s ..)
  6. By calling our temporary variable sign we have accidentally overwritten the rather useful built-in R function with that name. Name collisions of this sort are as easy in R as they are in Python.

4.3 How can I express a range of values in R?

By default, R’s for loop gives us the values in a vector, just as Python’s does. If we want to loop over the indices instead, we can use the function seq_along:

colors = c("eburnean", "glaucous", "wenge")
for (i in seq_along(colors)) {
  print(glue("The length of color {i} is {length(colors[i])}"))
}
The length of color 1 is 1
The length of color 2 is 1
The length of color 3 is 1

This output makes no sense until we remember that every value is a vector, and that length returns the length of a vector, so that length(colors[0]) is telling us that colors[0] contains one element. If we want the number of characters in the strings, we can use R’s built-in nchar or the more modern function stringr::str_length:

for (i in seq_along(colors)) {
  print(glue("The length of color {i} is {stringr::str_length(colors[i])}"))
}
The length of color 1 is 8
The length of color 2 is 8
The length of color 3 is 5

seq_along returns a vector containing a sequence of integers:

seq_along(colors)
[1] 1 2 3

Since sequences of this kind are used frequently, R lets us write them using range expressions like this:

5:10
[1]  5  6  7  8  9 10

Their most common use is as indices to vectors:

colors <- c("eburnean", "glaucous", "squamous", "wenge")
colors[1:3]
[1] "eburnean" "glaucous" "squamous"

We can similarly subtract a range of colors by index:

colors[-1:-3]
[1] "wenge"

However, R does not allow tripartite expressions of the form start:end:stride. For that, we must use the seq function:

seq(1, 10, 3)
[1]  1  4  7 10

This example also shows that ranges in R are inclusive at both ends, i.e., they run up to and including the upper bound. As is traditional among programming language advocates, people claim that this is more natural and then cite some supportive anecdote as proof.

Repeating Things

The function rep repeats things, so rep("a", 3) is c("a", "a", "a"). If the second argument is a vector of the same length as the first, it specifies how many times each item in the first vector is to be repeated: rep(c("a", "b"), c(2, 3)) is c("a", "a", "b", "b", "b").

4.4 How can I use a vector in a conditional statement?

We cannot use vectors directly as a condition in an if statement:

numbers <- c(0, 1, 2)
if (numbers) {
  print("This should not work.")
}
Warning in if (numbers) {: the condition has length > 1 and only the first
element will be used

Instead, we must collapse the vector into a single logical value.

numbers <- c(0, 1, 2)
if (all(numbers >= 0)) {
  print("This, on the other hand, should work.")
}
[1] "This, on the other hand, should work."

The function all returns TRUE if every element in its argument is TRUE; it corresponds to a logical “and” of all its inputs. We can use a corresponding function any to check if at least one value is TRUE, which corresponds to a logical “or” across the whole input.

4.5 How do I create and call functions?

As we have already seen, we call functions in R much as we do in Python:

max(1, 3, 5) + min(1, 3, 5)
[1] 6

We define a new function using the function keyword. This creates the function, but does not name it; to accomplish that, we must assign the newly-created function to a variable:

swap <- function(pair) {
  c(pair[2], pair[1])
}
swap(c("left", "right"))
[1] "right" "left" 

As this example shows, the result of a function is the value of the last expression evaluated within it. A function can return a value earlier using the return function; we can use return for the final value as well, but most R programmers do not.

swap <- function(pair) {
  if (length(pair) != 2) {
    return(NULL) # This is very bad practice.
  }
  c(pair[2], pair[1])
}
swap(c("one"))
NULL
swap(c("left", "right"))
[1] "right" "left" 

Returning NULL when our function’s inputs are invalid as we have done above is foolhardy, as doing so means that swap can fail without telling us that it has done so. Consider:

NULL[1]                 # Try to access an element of the vector that does not exist.
NULL
values <- 5:10          # More than two values.
result <- swap(values)  # Attempting to swap the values produces NULL.
result[1]               # But we can operate on the result without error.
NULL

We will look at what we should do instead in Chapter 10.

4.6 How can I write a function that takes a varying number of arguments?

If the number of arguments given to a function is not the number expected, R complains:

swap("one", "two", "three")
Error in swap("one", "two", "three"): unused arguments ("two", "three")

(Note that in this example we as passing three values, not a single vector containing three values.) If we want a function to handle a varying number of arguments, we represent the “extra” arguments with an ellipsis ... (three dots), which serves the same purpose as Python’s *args:

print_with_title <- function(title, ...) {
  print(glue("=={title}=="), paste(..., sep = "\n"))
}

print_with_title("to-do", "Monday", "Tuesday", "Wednesday")
==to-do==
Monday
Tuesday
Wednesday

(The function paste combines multiple arguments with the specified separator.)

R has a special data structure to represent the extra arguments in .... If we want to work with those arguments one by one, we must convert ... to a list:

add <- function(...) {
  result <- 0
  for (value in list(...)) {
    result <- result + value
  }
  result
}
add(1, 3, 5, 7)
[1] 16

4.7 How can I provide default values for arguments?

Like Python and most other modern programming languages, R lets us define default values for arguments and then pass arguments by name:

example <- function(first, second = "second", third = "third") {
  print(glue("first='{first}' second='{second}' third='{third}'"))
}

example("with just first")
first='with just first' second='second' third='third'
example("with first and second by position", "positional")
first='with first and second by position' second='positional' third='third'
example("with first and third by name", third = "by name")
first='with first and third by name' second='second' third='by name'

One caution: when you use a name in a function call, R ignores non-function objects when figuring out what function to call. For example, the call orange() in the code below produces 110 because purple(purple) is interpreted as “pass the value of the local variable purple into the globally-defined function purple”:

purple <- function(x) x + 100
orange <- function() {
  purple <- 10
  purple(purple)
}
orange()
[1] 110

4.8 How can I hide the value that R returns?

If the value returned by a function isn’t assigned to something, R will print it out. This usually isn’t what we want in library functions, so we can use the function invisible to mark a value so that it won’t be printed by default (but can still be assigned). This allows us to convert this:

something <- function(value) {
  10 * value
}
something(2)
[1] 20

to this:

something <- function(value) {
  invisible(10 * value)
}
something(2)

The calculation is still done, but the output is suppressed.

4.9 How can I assign to a global variable from inside a function?

The assignment operator <<- means “assign to a variable outside the current scope”. As the example below shows, this means that what looks like creation of a new local variable can actually be modification of a global one:

var <- "original value"

demonstrate <- function() {
  var <<- "new value"
}

demonstrate()
var
[1] "new value"

This should only and always be done with care: modern R strongly encourages a functional style of programming in which functions do not modify their input data, and nobody thinks that modifying global variables is a good idea any more.

4.10 Key Points

  • Use for (loop_variable in collection){ ...body... } to create a loop.
  • Use if (expression) { ...body... } else if (expression) { ...body... } else { ...body... } to create conditionals.
  • Expression conditions must have length 1; use any(...) and all(...) to collapse logical vectors to single values.
  • Use function(...arguments...) { ...body... } to create a function.
  • Use variable <- function(…arguments…) { …body… }` to create a function and give it a name.
  • The body of a function can be a single expression or a block in curly braces.
  • The last expression evaluated in a function is returned as its result.
  • Use return(expression) to return a result early from a function.