# 4. Indexing¶

There are multiple ways to access or replace values in vectors or other data structures. The most common approach is to use “indexing”. This is also referred to as “slicing”.

Note that brackets `[ ]`

are used for indexing, whereas parentheses
`( )`

are used to call a function.

## Vector¶

Here are some examples that show how elements of vectors can be obtained by indexing.

```
b <- 10:15
b
## [1] 10 11 12 13 14 15
# get the first element
b[1]
## [1] 10
# the second element
b[2]
## [1] 11
# elements 2 to 3
b[2:3]
## [1] 11 12
```

Now a more advanced example, return all elements except the second

```
b[c(1,3:6)]
## [1] 10 12 13 14 15
# or the simpler:
b[-2]
## [1] 10 12 13 14 15
```

You can also use an index to change values

```
b[1] <- 11
b
## [1] 11 11 12 13 14 15
b[3:6] <- -99
b
## [1] 11 11 -99 -99 -99 -99
```

An important characteristic of *R*’s vectorization system is that
shorter vectors are ‘recycled’. That is, they are repeated until the
necessary number of elements is reached. This applies in many
circumstances, and is very practical when you are aware of it. It may,
however, also lead to undetected errors, when this was not intended to
happen.

Here you see recycling at work. First we assign a single number to the
first three elements of `b`

, so the number is used three times. Then
we assign two numbers to a sequence of 3 to 6, such that both numbers
are used twice.

```
b[1:3] <- 2
b
## [1] 2 2 2 -99 -99 -99
b[3:6] <- c(10,20)
b
## [1] 2 2 10 20 10 20
```

## Matrix¶

Consider matrix `m`

.

```
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
m
## a b c
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
```

Like vectors, values of matrices can be accessed through indexing. There are different ways to do this, but it is generally easiest to use two numbers in a double index, the first for the row number(s) and the second for the column number(s).

```
# one value
m[2,2]
## b
## 5
# another one
m[1,3]
## c
## 3
```

You can also get multiple values at once.

```
# 2 columns and rows
m[1:2,1:2]
## a b
## [1,] 1 2
## [2,] 4 5
# entire row
m[2, ]
## a b c
## 4 5 6
# entire column
m[ ,2]
## [1] 2 5 8
```

Or use the column names for sub-setting.

```
#single column
m[, 'b']
## [1] 2 5 8
# two columns
m[, c('a', 'c')]
## a c
## [1,] 1 3
## [2,] 4 6
## [3,] 7 9
```

Instead of indexing with two numbers, you can also use a single number. You can think of this as a “cell number”. Cells are numbered column-wise (i.e., first the rows in the first column, then the second column, etc.). Thus,

```
m[2,2]
## b
## 5
# is equivalent to
m[5]
## [1] 5
```

Note that

```
m[ ,2]
## [1] 2 5 8
```

returns a vector. This is because a single-column matrix can be
simplified to a vector. In that case the matrix structure is ‘dropped’.
This is not always desirable, and to keep this from happening, you can
use the `drop=FALSE`

argument.

```
m[ , 2, drop=FALSE]
## b
## [1,] 2
## [2,] 5
## [3,] 8
```

Setting values of a matrix is similar to how you would do that for a vector, except that you now need to deal with two dimensions.

```
# one value
m[1,1] <- 5
m
## a b c
## [1,] 5 2 3
## [2,] 4 5 6
## [3,] 7 8 9
# a row
m[3,] <- 10
m
## a b c
## [1,] 5 2 3
## [2,] 4 5 6
## [3,] 10 10 10
# two columns, with recycling
m[,2:3] <- 3:1
m
## a b c
## [1,] 5 3 3
## [2,] 4 2 2
## [3,] 10 1 1
```

There is a function to get (or set) the values on the diagonal.

```
diag(m)
## [1] 5 2 1
diag(m) <- 0
m
## a b c
## [1,] 0 3 3
## [2,] 4 0 2
## [3,] 10 1 0
```

## List¶

Indexing lists can be a bit confusing as you can both refer to the
elements of the list, or the elements of the data (perhaps a matrix) in
one of the list elements. Note the difference that double brackets make.
`e[3]`

returns a list (of length 1), but `e[[3]]`

returns what is
inside that list element (a matrix in this case)

```
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
e <- list(list(1:3), c('a', 'b', 'c', 'd'), m)
```

We can access data inside a list element by combining double and single brackets. By using the double brackets, the list structure is dropped.

```
e[2]
## [[1]]
## [1] "a" "b" "c" "d"
e[[2]]
## [1] "a" "b" "c" "d"
```

List elements can have names.

```
names(e) <- c('zzz', 'xyz', 'abc')
```

And the elements can be extracted by their name, either as an index, or
by using the `$`

(dollar) operator.

```
e$xyz
## [1] "a" "b" "c" "d"
e[['xyz']]
## [1] "a" "b" "c" "d"
```

The `$`

can also be used with data.frame objects (a special list,
after all), but not with matrices.

## Data.frame¶

Indexing a `data.frame`

can generally be done as for matrices and for
lists.

First create a `data.frame`

from `matrix`

`m`

.

```
d <- data.frame(m)
class(d)
## [1] "data.frame"
```

You can extract a column by column number.

```
d[,2]
## [1] 2 5 8
```

Here is an alternative way to address the column number in a
`data.frame`

.

```
d[2]
## b
## 1 2
## 2 5
## 3 8
```

Note that whereas `[2]`

would be the second *element* in a `matrix`

,
it refers to the second *column* in a `data.frame`

. This is because a
`data.frame`

is a special kind of list and not a special kind of
matrix.

You can also use the column name to get values. This approach also works
for a `matrix`

.

```
d[, 'b']
## [1] 2 5 8
```

But with a `data.frame`

you can also do

```
d$b
## [1] 2 5 8
# or this
d[['b']]
## [1] 2 5 8
```

All these return a vector. That is, the complexity of the `data.frame`

structure was `dropped`

. This does not happen when you do

```
d['b']
## b
## 1 2
## 2 5
## 3 8
```

or

```
d[ , 'b', drop=FALSE]
## b
## 1 2
## 2 5
## 3 8
```

Why should you care about this `drop`

business? Well, in many cases
*R* functions want a specific data type, such as a `matrix`

or
`data.frame`

and report an error if they get something else. One
common situation is that you think you provide data of the right type,
such as a `data.frame`

, but that in fact you are providing a
`vector`

, because the structure `dropped`

.

## Which, %in% and match¶

Sometimes you do not have the indices you need, and so you need to find them. For example, what are the indices of the elements in a vector that have values above 15?

```
x <- 10:20
i <- which(x > 15)
x
## [1] 10 11 12 13 14 15 16 17 18 19 20
i
## [1] 7 8 9 10 11
x[i]
## [1] 16 17 18 19 20
```

Note, however, that you can also use a logical vector for indexing
(values for which the index is `TRUE`

are returned).

```
x <- 10:20
b <- x > 15
x
## [1] 10 11 12 13 14 15 16 17 18 19 20
b
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
x[b]
## [1] 16 17 18 19 20
```

A very useful operator that allows you to ask whether a set of values is
present in a vector is `%in%`

.

```
x <- 10:20
j <- c(7,9,11,13)
j %in% x
## [1] FALSE FALSE TRUE TRUE
which(j %in% x)
## [1] 3 4
```

Another handy similar function is `match`

:

```
match(j, x)
## [1] NA NA 2 4
```

telling us that the third value in `j`

is equal to the second value in
`x`

and that the fourth value in ‘j’ is equal to the fourth value in
`x`

.

`match`

is asymmetric: `match(j,x)`

is not the same as
`match(x,j)`

.

```
match(x, j)
## [1] NA 3 NA 4 NA NA NA NA NA NA NA
```

This tells us that the second value in `x`

is equal to the third value
in ‘j’, etc.