# 3. Basic data structures¶

In the previous chapter we saw the most basic data types in *R*: vectors
of numeric, integer, character, factor and boolean values. These were
all stored in a vector. In this chapter we look at additional data
structures that can store basic data: the `matrix`

, `data.frame`

and
`list`

.

## Matrix¶

A vector is a one-dimensional array. A two-dimensional array can be represented with a matrix. Here is how you can create a matrix with two rows and three columns.

```
matrix(ncol=3, nrow=2)
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
```

The matrix above did not have any values: all values were
missing (`NA`

). Let’s
make a matrix with values 1 to 6.

```
matrix(1:6, ncol=3, nrow=2)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
```

Note that by default the values are distributed column-wise. To go
row-wise you can use the `byrow=TRUE`

argument.

```
matrix(1:6, ncol=3, nrow=2, byrow=TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
```

This can also be achieved by switching the number of columns and rows
and using the `t`

(transpose) function.

```
m <- matrix(1:6, ncol=2, nrow=3)
t(m)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
```

It is common to create a matrix by column-binding and/or row-binding
vectors using `cbind`

and `rbind`

. These are two of the most
commonly used functions in *R* so pay close attention!

```
a <- c(1,2,3)
b <- 5:7
```

column binding

```
m1 <- cbind(a, b)
m1
## a b
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
```

row binding

```
m2 <- rbind(a, b)
m2
## [,1] [,2] [,3]
## a 1 2 3
## b 5 6 7
```

You can use `cbind`

and `rbind`

also to combine matrices, as long as
the number of rows or columns of the two objects are the same.

```
m3 <- cbind(b, b, a)
m <- cbind(m1, m3)
m
## a b b b a
## [1,] 1 5 5 5 1
## [2,] 2 6 6 6 2
## [3,] 3 7 7 7 3
```

We can get some of the structural properties of a matrix with functions
such as `nrow`

, `ncol`

, `dim`

and `length`

.

```
nrow(m)
## [1] 3
ncol(m)
## [1] 5
# dimensions of m (nrow, ncol))
dim(m)
## [1] 3 5
# number of cells, or nrow(m) * ncol(m)
length(m)
## [1] 15
```

Columns have (variable) names that can be changed.

```
# get the column names
colnames(m)
## [1] "a" "b" "b" "b" "a"
# set the column names
colnames(m) <- c('ID', 'X', 'Y', 'v1', 'v2')
m
## ID X Y v1 v2
## [1,] 1 5 5 5 1
## [2,] 2 6 6 6 2
## [3,] 3 7 7 7 3
```

Likewise there are row names, but these are less important.

```
rownames(m) <- paste0('row_', 1:nrow(m))
m
## ID X Y v1 v2
## row_1 1 5 5 5 1
## row_2 2 6 6 6 2
## row_3 3 7 7 7 3
```

A matrix can only store a single data type. If you try to mix character and numeric values, all values will become character values (as the other way around may not be possible).

```
cbind(vchar=c('a','b'), vnumb=1:2)
## vchar vnumb
## [1,] "a" "1"
## [2,] "b" "2"
```

You can see that 1 and 2 are character values because they are quoted.
You could not use them in algebra without first converting them back to
numbers. Note that the column names were set by providing them to
`cbind`

A matrix is a two dimensional array. Higher dimensional arrays can also
be created. See `help(array)`

, but these are not that commonly used,
so we do not discuss them here.

## List¶

A `list`

is a very flexible container to store data. Each element of a
list can contain any type of *R* object, e.g. a vector, matrix,
data.frame, another list, or more complex data types.

A simple list:

```
list(1:3)
## [[1]]
## [1] 1 2 3
```

It shows that the first element `[[1]]`

contains a vector of
`1, 2, 3`

Here is one with two data types.

```
e <- list(c(2,5), 'abc')
e
## [[1]]
## [1] 2 5
##
## [[2]]
## [1] "abc"
```

List elements can be named.

```
names(e) <- c('first', 'last')
e
## $first
## [1] 2 5
##
## $last
## [1] "abc"
```

And a more complex list.

```
m <- matrix(1:6, ncol=3, nrow=2)
f <- list(e, m, 'abc')
f
## [[1]]
## [[1]]$first
## [1] 2 5
##
## [[1]]$last
## [1] "abc"
##
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
##
## [[3]]
## [1] "abc"
```

Note that the first element of list `f`

is itself a list of two
elements.

## Data frame¶

The `data.frame`

is the workhorse for statistical data analysis in
*R*. It is rectangular like a matrix, but unlike matrices a
`data.frame`

can have columns (variables) of different data types. A
`data.frame`

is what you get when you read spreadsheet-like data into
*R* with functions like `read.table`

or `read.csv`

. We’ll show that
in a later chapter. We can also create a `data.frame`

with some simple
code.

```
# four vectors
ID <- as.integer(1:4)
name <- c('Ana', 'Rob', 'Liu', 'Veronica')
sex <- as.factor(c('F','M','M','F'))
score <- c(10.2, 9, 13.5, 18)
d <- data.frame(ID, name, sex, score, stringsAsFactors=FALSE)
d
## ID name sex score
## 1 1 Ana F 10.2
## 2 2 Rob M 9.0
## 3 3 Liu M 13.5
## 4 4 Veronica F 18.0
```

I used the argument `stringsAsFactors=FALSE`

to avoid converting the
character variable `name`

to a factor. `d`

is a data.frame, but
individual columns can be of any class. Note that the length of a
data.frame is defined as the number of variables (columns), while the
length of a matrix is defined as the number of cells! This is because a
matrix is a special kind of `vector`

, while a `data.frame`

is a
special kind of `list`

in which each element has the same size.

```
class(d)
## [1] "data.frame"
length(d)
## [1] 4
```

Because a `data.frame`

is a special kind of list, you can do with a
data.frame what you can do with a list.

```
is.list(d)
## [1] TRUE
names(d)
## [1] "ID" "name" "sex" "score"
```

But in other ways, a `data.frame`

is also similar to a matrix (which
normal lists are not).

```
nrow(d)
## [1] 4
dim(d)
## [1] 4 4
colnames(d)
## [1] "ID" "name" "sex" "score"
```