5. Algebra

Vectors and matrices can be used to compute new vectors (matrices) with simple and intuitive algebraic expressions.

Vector algebra

We have two vectors, a and b

a <- 1:5
b <- 6:10

Multiplication works element by element. That is a[1] * b[1], a[2] * b[2], etc

d <- a * b
a
## [1] 1 2 3 4 5
b
## [1]  6  7  8  9 10
d
## [1]  6 14 24 36 50

The examples above illustrate a special feature of R not found in most other programming languages. This is that you do not need to ‘loop’ over elements in an array (vector in this case) to compute new values. It is important to use this feature as much as possible. In other programming languages you would need to do something like the ‘for-loop’ below to achieve the above (for-loops do exist in R and are discussed in a later chapter).

You can also multiply with a single number.

a * 3
## [1]  3  6  9 12 15

In the examples above the computations used either vectors of the same length, or one of the vectors had length 1. But be careful, you can use algebraic computations with vectors of different lengths, as the shorter ones will be “recycled”. R only issues a warning if the length of the longer vector is not a multiple of the length of the shorter object. This is a great feature when you need it, but it may also make you overlook errors when your data are not what you think they are.

a + c(1,10)
## Warning in a + c(1, 10): longer object length is not a multiple of shorter
## object length
## [1]  2 12  4 14  6

No warning here:

1:6 + c(0,10)
## [1]  1 12  3 14  5 16

Logical comparisons

Recall that == is used to test for equality

a == 2
## [1] FALSE  TRUE FALSE FALSE FALSE
f <- a > 2
f
## [1] FALSE FALSE  TRUE  TRUE  TRUE

& is Boolean “AND”, and | is Boolean “OR”.

a
## [1] 1 2 3 4 5
b
## [1]  6  7  8  9 10
b > 6 & b < 8
## [1] FALSE  TRUE FALSE FALSE FALSE
# combining a and b
b > 9 | a < 2
## [1]  TRUE FALSE FALSE FALSE  TRUE

“Less than or equal” is <=, and “more than or equal” is >=.

b >= 9
## [1] FALSE FALSE FALSE  TRUE  TRUE
a <= 2
## [1]  TRUE  TRUE FALSE FALSE FALSE
b >= 9 | a <= 2
## [1]  TRUE  TRUE FALSE  TRUE  TRUE
b >= 9 & a <= 2
## [1] FALSE FALSE FALSE FALSE FALSE

Functions

There are many functions that allow us to do vectorized algebra. For example:

sqrt(a)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
exp(a)
## [1]   2.718282   7.389056  20.085537  54.598150 148.413159

Not all functions return a vector of the same length. The following functions return a single number:

min(a)
## [1] 1
max(a)
## [1] 5
range(a)
## [1] 1 5
sum(a)
## [1] 15
mean(a)
## [1] 3
median(a)
## [1] 3
prod(a)
## [1] 120
sd(a)
## [1] 1.581139

If you cannot guess what prod and sd do, look it up in the help files (e.g. ?sd)

Random numbers

It is common to create a vector of random numbers in data analysis, and also to create example data to demonstrate how a procedure works. To get 10 numbers sampled from the uniform distribution between 0 and 1 you can do

r <- runif(10)
r
##  [1] 0.07003268 0.96431704 0.44251011 0.37027238 0.14118509 0.05419043
##  [7] 0.65782807 0.57816192 0.98710176 0.60379240

For Normally distributed numbers, use rnorm

r <- rnorm(10, mean=10, sd=2)
r
##  [1]  6.971006  9.876585  9.705458 13.083186  8.036289 10.993156 13.393896
##  [8]  9.478527  8.588143  9.677643

If you run the functions above, you will get different numbers. After all, they are random numbers! Well, computer generated numbers are not truly random, but ‘pseudo-random’. To be able to exactly reproduce examples or data analysis we often want to assure that we take exactly the same “random” sample each time we run our code. To do that we use set.seed. This function initialized the random number generator to a specific point. This is illustrated below.

set.seed(12)
runif(3)
## [1] 0.06936092 0.81777520 0.94262173
runif(4)
## [1] 0.26938188 0.16934812 0.03389562 0.17878500
runif(5)
## [1] 0.641665366 0.022877743 0.008324827 0.392697197 0.813880559

set.seed(12)
runif(3)
## [1] 0.06936092 0.81777520 0.94262173
runif(5)
## [1] 0.26938188 0.16934812 0.03389562 0.17878500 0.64166537

set.seed(12)
runif(3)
## [1] 0.06936092 0.81777520 0.94262173
runif(5)
## [1] 0.26938188 0.16934812 0.03389562 0.17878500 0.64166537

Note that each time set.seed is called, the same sequence of (pseudo) random numbers will be generated. This is a very important feature, as it allows us to exactly reproduce results that involve random sampling. The seed number is arbitrary; a different seed number will give a different sequence.

set.seed(12)
runif(3)
## [1] 0.06936092 0.81777520 0.94262173
runif(5)
## [1] 0.26938188 0.16934812 0.03389562 0.17878500 0.64166537

Matrices

Computation with matrices is also ‘vectorized’. For example, with matrix m you can do m * 5 to multiply all values of m3 with 5, or do m^2 or m * m to square the values of m.

# set up an example matrix
m <- matrix(1:6, ncol=3, nrow=2, byrow=TRUE)
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

m * 2
##      [,1] [,2] [,3]
## [1,]    2    4    6
## [2,]    8   10   12

m^2
##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36

We can also do math with a matrix and a vector. Note, again, that computation with matrices in R is column-wise, and that shorter vectors are recycled.

m * 1:2
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    8   10   12

You can multiply two matrices.

m * m
##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36

Note that this is “cell by cell” multiplication. For ‘matrix multiplication’ in the mathematical sense, you need to use the %*% operator.

m %*% t(m)
##      [,1] [,2]
## [1,]   14   32
## [2,]   32   77