Chapter 3 Indexing

3.1 Learning Objectives

  • Explain the difference between a list and a vector.
  • Explain the difference between indexing with [ and with [[.
  • Use [ and [[ correctly to extract elements and sub-structures from data structures in R.
  • Create a named list in R.
  • Access elements by name using both [ and $ notation.
  • Correctly identify cases in which back-quoting is necessary when accessing elements via $.
  • Create and index matrices in R.

One of the things that newcomers to R often trip over is the various ways in which structures can be indexed. All of the following are legal:

thing[i, j]
thing[[i, j]]

but they can behave differently depending on what kind of thing thing is. To explain, we must first take a look at lists.

3.2 How can I store a mix of different types of objects?

A list in R is a vector that can contain values of many different types. (The technical term for this is heterogeneous, in contrast with a homogeneous data structure that can only contain one type of value.) We’ll use this list in our examples:

thing <- list("first", c(2, 20, 200), 3.3)
[1] "first"

[1]   2  20 200

[1] 3.3

The output tells us that the first element of thing is a vector of one element, that the second is a vector of three elements, and the third is again a vector of one element.

3.3 What is the difference between [ and [[?

The output above strongly suggests that we can get the elements of a list using [[ (double square brackets):

[1] "first"
[1]   2  20 200
[1] 3.3

Let’s have a look at the types of those three values:

[1] "character"
[1] "double"
[1] "double"

Good: they are vectors. (Remember, everything in R is a vector—it doesn’t have scalars in the usual sense.) What do we get if we use single square brackets [?

[1] "list"

Sure enough, the value itself is a list:

[1] "first"

This shows the difference between [[ and [: the former peels away a layer of data structure, returning only the sub-structure, while the latter gives us back a structure of the same type as the thing being indexed. Since a “scalar” is just a vector of length 1, there is no difference between [[ and [ when they are applied to vectors:

v <- c("first", "second", "third")
[1] "second"
[1] "character"
[1] "second"
[1] "character"

Flattening and Recursive Indexing

If a list is just a vector of objects, why do we need the function list? Why can’t we create a list with c("first", c(2, 20, 200), 30)? The answer is that R flattens the arguments to c, so that c(c(1, 2), c(3, 4)) produces c(1, 2, 3, 4). It also does automatic type conversion: c("first", c(2, 20, 200), 30) produces a vector of character strings c("first", "2", "20", "200", "30"). This is helpful once you get used to it.

Another “helpful, ish” behavior is that using [[ with a list subsets recursively: if thing <- list(a = list(b = list(c = list(d = 1)))), then thing[[c("a", "b", "c", "d")]] selects the 1.

3.4 How can I access elements by name?

R allows us to name the elements in vectors: if we assign c(one = 1, two = 2, three = 3) to names, then names["two"] is 2. We can use this to create a lookup table:

values <- c("m", "f", "nb", "f", "f", "m", "m")
lookup <- c(m = "Male", f = "Female", nb = "Non-binary")
           m            f           nb            f            f 
      "Male"     "Female" "Non-binary"     "Female"     "Female" 
           m            m 
      "Male"       "Male" 

If the structure in question is a list rather than an atomic vector of numbers, characters, or logicals, we can use the syntax lookup$m instead of lookup["m"]:

lookup_list <- list(m = "Male", f = "Female", nb = "Non-binary")
[1] "Male"

We will explore this in more detail when we look at the tidyverse in Chapter 5, since that is where access-by-name is used most often. For now, simply note that if the name of an element isn’t a legal variable name, we have to put it in backward quotes to use it as an accessor:

another_list <- list("first field" = "F", "second field" = "S")
another_list$`first field`
[1] "F"

Wherever possible, it’s better to choose names that don’t require back-quoting, such as first_field.

3.5 How can I create and index a matrix?

Matrices are frequently used in statistics, so R provides built-in support for them. After a <- matrix(1:9, nrow = 3), a is a 3x3 matrix containing the values 1 through 9. What may surprise you is the order in which the values generated by the expression 1:9 are laid out:

a <- matrix(1:9, nrow = 3)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Under the hood, a matrix is a vector with an attribute called dim that stores its dimensions:

[1] 3 3

a[3, 3] is a vector of length 1 containing the value 9 (again, “scalars” in R are actually vectors), while a[1,] is the vector c(1, 4, 7) (because we are selecting the first row of the matrix) and a[,1] is the vector c(1, 2, 3) (because we are selecting the first column of the matrix). Elements can still be accessed using a single index, which returns the value from that location in the underlying vector:

[1] 8

3.6 Key Points

  • A list is a heterogeneous vector capable of storing values of any type (including other lists).
  • Indexing with [ returns a structure of the same type as the structure being indexed (e.g., returns a list when applied to a list).
  • Indexing with [[ strips away one level of structure (i.e., returns the indicated element without any wrapping).
  • Use list('name' = value, ...) to name the elements of a list.
  • Use either L['name'] or L$name to access elements by name.
  • Use back-quotes around the name with $ notation if the name is not a legal R variable name.
  • Use matrix(values, nrow = N) to create a matrix with N rows containing the given values.
  • Use m[i, j] to get the value at the i’th row and j’th column of a matrix.
  • Use m[i,] to get a vector containing the values in the i’th row of a matrix.
  • Use m[,j] to get a vector containing the values in the j’th column of a matrix.