Indexing
There are multiple ways to access or replace values in vectors or other data structures. The most common approach is to use “indexing”. This is also referred to as “slicing”.
In the below, note that brackets [ ]
are used for indexing, whereas
you have already seen that parentheses ( )
are used to call a
function. Later on, you will also see the use of { }
. It is very
important not to mix these up.
Vector
Here are some examples that show how elements of vectors can be obtained by indexing.
b <- 10:15
b
## [1] 10 11 12 13 14 15
Get the first element of a vector
b[1]
## [1] 10
Get the first second element of a vector
b[2]
## [1] 11
Get elements 2 and 3
b[2:3]
## [1] 11 12
# this is the same as
b[c(2,3)]
## [1] 11 12
# or
i <- 2:3
b[i]
## [1] 11 12
Now a more advanced example, return all elements except the second
b[c(1,3:6)]
## [1] 10 12 13 14 15
# or the much simpler:
b[-2]
## [1] 10 12 13 14 15
You can also use an index to change values
b[1] <- 11
b
## [1] 11 11 12 13 14 15
b[3:6] <- -99
b
## [1] 11 11 -99 -99 -99 -99
An important characteristic of R’s vectorization system is that shorter vectors are ‘recycled’. That is, they are repeated until the necessary number of elements is reached. This applies in many circumstances, and is very practical when you are aware of it. It may, however, also lead to undetected errors, when this was not intended to happen.
Here you see recycling at work. First we assign a single number to the
first three elements of b
, so the number is used three times. Then
we assign two numbers to a sequence of 3 to 6, such that both numbers
are used twice.
b[1:3] <- 2
b
## [1] 2 2 2 -99 -99 -99
b[3:6] <- c(10,20)
b
## [1] 2 2 10 20 10 20
Matrix
Consider matrix m
.
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
m
## a b c
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
Like vectors, values of matrices can be accessed through indexing. There are different ways to do this, but it is generally easiest to use two numbers in a double index, the first for the row number(s) and the second for the column number(s).
# one value
m[2,2]
## b
## 5
# another one
m[1,3]
## c
## 3
You can also get multiple values at once.
# 2 columns and rows
m[1:2,1:2]
## a b
## [1,] 1 2
## [2,] 4 5
# entire row
m[2, ]
## a b c
## 4 5 6
# entire column
m[ ,2]
## [1] 2 5 8
Or use the column names for sub-setting.
#single column
m[, 'b']
## [1] 2 5 8
# two columns
m[, c('a', 'c')]
## a c
## [1,] 1 3
## [2,] 4 6
## [3,] 7 9
Instead of indexing with two numbers, you can also use a single number. You can think of this as a “cell number”. Cells are numbered column-wise (i.e., first the rows in the first column, then the second column, etc.). Thus,
m[2,2]
## b
## 5
# is equivalent to
m[5]
## [1] 5
Note that
m[ ,2]
## [1] 2 5 8
returns a vector. This is because a single-column matrix can be
simplified to a vector. In that case the matrix structure is ‘dropped’.
This is not always desirable, and to keep this from happening, you can
use the drop=FALSE
argument.
m[ , 2, drop=FALSE]
## b
## [1,] 2
## [2,] 5
## [3,] 8
Setting values of a matrix is similar to how you would do that for a vector, except that you now need to deal with two dimensions.
# one value
m[1,1] <- 5
m
## a b c
## [1,] 5 2 3
## [2,] 4 5 6
## [3,] 7 8 9
# a row
m[3,] <- 10
m
## a b c
## [1,] 5 2 3
## [2,] 4 5 6
## [3,] 10 10 10
# two columns, with recycling
m[,2:3] <- 3:1
m
## a b c
## [1,] 5 3 3
## [2,] 4 2 2
## [3,] 10 1 1
There is a function to get (or set) the values on the diagonal of the matrix.
diag(m)
## [1] 5 2 1
diag(m) <- 0
m
## a b c
## [1,] 0 3 3
## [2,] 4 0 2
## [3,] 10 1 0
List
Indexing lists can be a bit confusing as you can both refer to the
elements of the list, or the elements of the data (perhaps a matrix) in
one of the list elements. Below, note the difference that double
brackets make. e[3]
returns a list (of length 1), but e[[3]]
returns what is inside that list element (a matrix in this case)
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
e <- list(list(1:3), c('a', 'b', 'c', 'd'), m)
We can access data inside a list element by combining double and single brackets. By using the double brackets, the list structure is dropped.
e[2]
## [[1]]
## [1] "a" "b" "c" "d"
e[[2]]
## [1] "a" "b" "c" "d"
List elements can have names.
names(e) <- c('zzz', 'xyz', 'abc')
And the elements can be extracted by their name, either as an index, or
by using the $
(dollar) operator.
e$xyz
## [1] "a" "b" "c" "d"
e[['xyz']]
## [1] "a" "b" "c" "d"
The $
can also be used with data.frame objects (a special list,
after all), but not with matrices.
Data.frame
Indexing a data.frame
can generally be done as for matrices and for
lists.
First create a data.frame
from matrix
m
.
d <- data.frame(m)
class(d)
## [1] "data.frame"
You can extract a column by column number.
d[,2]
## [1] 2 5 8
Here is an alternative way to address the column number in a
data.frame
.
d[2]
## b
## 1 2
## 2 5
## 3 8
Note that whereas [2]
would be the second element in a matrix
,
it refers to the second column in a data.frame
. This is because a
data.frame
is a special kind of list and not a special kind of
matrix.
You can also use the column name to get values. This approach also works
for a matrix
.
d[, 'b']
## [1] 2 5 8
But with a data.frame
you can also do
d$b
## [1] 2 5 8
# or this
d[['b']]
## [1] 2 5 8
All these return a vector. That is, the complexity of the data.frame
structure was dropped
. This does not happen when you do
d['b']
## b
## 1 2
## 2 5
## 3 8
or
d[ , 'b', drop=FALSE]
## b
## 1 2
## 2 5
## 3 8
Why should you care about this drop
business? Well, in many cases
R functions want a specific data type, such as a matrix
or
data.frame
and report an error if they get something else. One
common situation is that you think you provide data of the right type,
such as a data.frame
, but that in fact you are providing a
vector
, because the structure dropped
if you subsetted the data
to a single column.
Which, %in% and match
Sometimes you do not have the indices you need, and so you need to find them. For example, what are the indices of the elements in a vector that have values above 15?
x <- 10:20
i <- which(x > 15)
x
## [1] 10 11 12 13 14 15 16 17 18 19 20
i
## [1] 7 8 9 10 11
x[i]
## [1] 16 17 18 19 20
Note, however, that you can also use a logical vector for indexing
(values for which the index is TRUE
are returned).
x <- 10:20
b <- x > 15
x
## [1] 10 11 12 13 14 15 16 17 18 19 20
b
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
x[b]
## [1] 16 17 18 19 20
A very useful operator that allows you to ask whether a set of values is
present in a vector is %in%
.
x <- 10:20
j <- c(7,9,11,13)
j %in% x
## [1] FALSE FALSE TRUE TRUE
which(j %in% x)
## [1] 3 4
Another handy similar function is match
:
match(j, x)
## [1] NA NA 2 4
This tells us that the third value in j
is equal to the second value
in x
and that the fourth value in ‘j’ is equal to the fourth value
in x
.
match
is asymmetric: match(j,x)
is not the same as
match(x,j)
.
match(x, j)
## [1] NA 3 NA 4 NA NA NA NA NA NA NA
This shows that the second value in x
is equal to the third value in
‘j’, etc.