4. Indexing

There are multiple ways to access or replace values in vectors or other data structures. The most common approach is to use “indexing”. This is also referred to as “slicing”.

Note that brackets [ ] are used for indexing, whereas parentheses ( ) are used to call a function.

Vector

Here are some examples that show how elements of vectors can be obtained by indexing.

b <- 10:15
b
## [1] 10 11 12 13 14 15
# get the first element
b[1]
## [1] 10
# the second element
b[2]
## [1] 11
# elements 2 to 3
b[2:3]
## [1] 11 12

Now a more advanced example, return all elements except the second

b[-2]
## [1] 10 12 13 14 15

You can also use an index to change values

b[1] <- 11
b
## [1] 11 11 12 13 14 15
b[3:6] <- -99
b
## [1]  11  11 -99 -99 -99 -99

An important characteristic of R‘s vectorization system is that shorter vectors are ‘recycled’. That is, they are repeated until the necessary number of elements is reached. This applies in many circumstances, and is very practical when you are aware of it. It may, however, also lead to undetected errors, when this was not intended to happen.

Here you see recycling at work. First we assign a single number to the first three elements of b, so the number is used three times. Then we assign two numbers to a sequence of 3 to 6, such that both numbers are used twice.

b[1:3] <- 2
b
## [1]   2   2   2 -99 -99 -99
b[3:6] <- c(10,20)
b
## [1]  2  2 10 20 10 20

Matrix

Consider matrix m.

m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
m
##      a b c
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9

Like vectors, values of matrices can be accessed through indexing. There are different ways to do this, but it is generally easiest to use two numbers in a double index, the first for the row number(s) and the second for the column number(s).

# one value
m[2,2]
## b
## 5
# another one
m[1,3]
## c
## 3

You can also get multiple values at once.

# 2 columns and rows
m[1:2,1:2]
##      a b
## [1,] 1 2
## [2,] 4 5

# entire row
m[2, ]
## a b c
## 4 5 6

# entire column
m[ ,2]
## [1] 2 5 8

Or use the column names for sub-setting.

#single column
m[, 'b']
## [1] 2 5 8
# two columns
m[, c('a', 'c')]
##      a c
## [1,] 1 3
## [2,] 4 6
## [3,] 7 9

Instead of indexing with two numbers, you can also use a single number. You can think of this as a “cell number”. Cells are numbered column-wise (i.e., first the rows in the first column, then the second column, etc.). Thus,

m[2,2]
## b
## 5
# is equivalent to
m[5]
## [1] 5

Note that

m[ ,2]
## [1] 2 5 8

returns a vector. This is because a single-column matrix can be simplified to a vector. In that case the matrix structure is ‘dropped’. This is not always desirable, and to avoid this from happening, you can use the drop=FALSE argument.

m[ , 2, drop=FALSE]
##      b
## [1,] 2
## [2,] 5
## [3,] 8

Setting values of a matrix is similar to how you would do that for a vector, except that you now need to deal with two dimensions.

# one value
m[1,1] <- 5
m
##      a b c
## [1,] 5 2 3
## [2,] 4 5 6
## [3,] 7 8 9
# a row
m[3,] <- 10
m
##       a  b  c
## [1,]  5  2  3
## [2,]  4  5  6
## [3,] 10 10 10
# two columns, with recycling
m[,2:3] <- 3:1
m
##       a b c
## [1,]  5 3 3
## [2,]  4 2 2
## [3,] 10 1 1

There is a function to get (or set) the values on the diagonal.

diag(m)
## [1] 5 2 1
diag(m) <- 0
m
##       a b c
## [1,]  0 3 3
## [2,]  4 0 2
## [3,] 10 1 0

List

Indexing lists can be a bit confusing as you can both refer to the elements of the list, or the elements of the data (perhaps a matrix) in one of the list elements. Note the difference that double brackets make. e[3] returns a list (of length 1), but e[[3]] returns what is inside that list element (a matrix in this case)

m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
e <- list(list(1:3), c('a', 'b', 'c', 'd'), m)

We can access data inside a list element by combining double and single brackets. By using the double brackets, the list structure is dropped.

e[2]
## [[1]]
## [1] "a" "b" "c" "d"
e[[2]]
## [1] "a" "b" "c" "d"

list elements can have names.

names(e) <- c('zzz', 'xyz', 'abc')

And the elements can be extracted by their name, either as an index, or by using the $ (dollar) operator.

e$xyz
## [1] "a" "b" "c" "d"
e[['xyz']]
## [1] "a" "b" "c" "d"

The S can also be used with data.frame objects (a special list, after all), but not with matrices.

Data.frame

Indexing a data.frame can generally be done as for matrices and for lists.

First create a data.frame from matrix m.

d <- data.frame(m)
class(d)
## [1] "data.frame"

You can extract a column by column number.

d[,2]
## [1] 2 5 8

Here is an alternative way to address the column number in a data.frame.

d[2]
##   b
## 1 2
## 2 5
## 3 8

Note that whereas [2] would be the second element in a matrix, it refers to the second column in a data.frame. This is because a data.frame is a special kind of list and not a special kind of matrix.

You can also use the column name to get values. This approach also works for a matrix.

d[, 'b']
## [1] 2 5 8

But with a data.frame you can also do

d$b
## [1] 2 5 8
# or this
d[['b']]
## [1] 2 5 8

All these return a vector. That is, the complexity of the data.frame structure was dropped. This does not happen when you do

d['b']
##   b
## 1 2
## 2 5
## 3 8

or

d[ , 'b', drop=FALSE]
##   b
## 1 2
## 2 5
## 3 8

Why should you care about this drop business? Well, in many cases R functions want a specific data type, such as a matrix or data.frame and report an error if they get something else. One common situation is that you think you provide data of the right type, such as a data.frame, but that in fact you are providing a vector, because the structure dropped.

which, %in% and match

Sometimes you do not have the indices you need, and so you need to find them. For example, what are the indices of the elements in a vector that have values above 15?

x <- 10:20
i <- which(x > 15)
x
##  [1] 10 11 12 13 14 15 16 17 18 19 20
i
## [1]  7  8  9 10 11
x[i]
## [1] 16 17 18 19 20

Note, however, that you can also use a logical vector for indexing (values for which the index is TRUE are returned).

x <- 10:20
b <- x > 15
x
##  [1] 10 11 12 13 14 15 16 17 18 19 20
b
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
x[b]
## [1] 16 17 18 19 20

A very useful operator that allows you to ask whether a set of values is present in a vector is %in%.

x <- 10:20
j <- c(7,9,11,13)
j %in% x
## [1] FALSE FALSE  TRUE  TRUE
which(j %in% x)
## [1] 3 4

Another handy similar function is match:

match(j, x)
## [1] NA NA  2  4

telling us that the third value in j is equal to the second value in x and that the fourth value in ‘j’ is equal to the fourth value in x.

match is asymmetric: match(j,x) is not the same as match(x,j).

match(x, j)
##  [1] NA  3 NA  4 NA NA NA NA NA NA NA

This tells us that the second value in x is equal to the third value in ‘j’, etc.