Algebra

Vectors and matrices can be used to compute new vectors (matrices) with simple and intuitive algebraic expressions.

Vector algebra

We have two vectors, a and b

a <- 1:5
b <- 6:10

Multiplication works element by element. That is a[1] * b[1], a[2] * b[2], etc

d <- a * b
a
## [1] 1 2 3 4 5
b
## [1]  6  7  8  9 10
d
## [1]  6 14 24 36 50

The examples above illustrate a special feature of R not found in most other programming languages. This is that you do not need to ‘loop’ over elements in an array (vector in this case) to compute new values. It is important to use this feature as much as possible. In other programming languages you would need to write a for-loop to achieve the above (for-loops do exist in R. They are very important and are discussed in a later chapter).

You can also multiply a vector with a single number.

a * 3
## [1]  3  6  9 12 15

In the examples above the computations used either vectors of the same length, or one of the vectors had length 1. You can use algebraic computations with vectors of different lengths, as the shorter ones will be “recycled”. R only issues a warning if the length of the longer vector is not a multiple of the length of the shorter object. This is a great feature when you need it, but it may also make you overlook errors when your data are not what you think they are.

a + c(1,10)
## Warning in a + c(1, 10): longer object length is not a multiple of shorter
## object length
## [1]  2 12  4 14  6

No warning here:

1:6 + c(0,10)
## [1]  1 12  3 14  5 16

Logical comparisons

It is very common in computer programs to test for (in)equality or whether a value is greater of smaller than another value.

Recall that == is used to test for equality

a <- 1:5
b <- 6:10
a == 2
## [1] FALSE  TRUE FALSE FALSE FALSE

And inequality is evaluated with !=

a != 2

“Less than or equal” is <=, and “more than or equal” is >=.

a < 3
## [1]  TRUE  TRUE FALSE FALSE FALSE
b >= 9
## [1] FALSE FALSE FALSE  TRUE  TRUE

& is Boolean “AND”, and | is Boolean “OR”.

a
## [1] 1 2 3 4 5
b
## [1]  6  7  8  9 10
b > 6 & b < 8
## [1] FALSE  TRUE FALSE FALSE FALSE
# combining a and b
b > 9 | a <= 2
## [1]  TRUE  TRUE FALSE FALSE  TRUE

Functions

There are many functions that allow us to do vectorized algebra. For example:

sqrt(a)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
exp(a)
## [1]   2.718282   7.389056  20.085537  54.598150 148.413159

Not all functions return a vector of the same length. The following functions return just one or two numbers:

min(a)
## [1] 1
max(a)
## [1] 5
range(a)
## [1] 1 5
sum(a)
## [1] 15
mean(a)
## [1] 3
median(a)
## [1] 3
prod(a)
## [1] 120
sd(a)
## [1] 1.581139

If you cannot guess what prod and sd do, look it up in the help files (e.g. ?sd)

Random numbers

It is common to create a vector of random numbers in data analysis, and also to create example data to demonstrate how a procedure works. To get 10 numbers sampled from the uniform distribution between 0 and 1 you can do

r <- runif(10)
r
##  [1] 0.42224924 0.28976593 0.39076520 0.92341038 0.01164684 0.13008961
##  [7] 0.17059007 0.63921797 0.67293907 0.58224416

For Normally distributed numbers, use rnorm

r <- rnorm(10, mean=10, sd=2)
r
##  [1] 12.032646  7.460689  9.450634  7.252248  9.606060  8.577337  8.273873
##  [8] 11.191274  9.139506 11.629766

If you run the functions above, you will get different numbers then the ones shown here. After all, they are random numbers! Modern data analysis methods use a lot of randomization. This can make a challange to exactely reproduce results obtained. To allow for exact reproduction of examples or real data analysis, we often want to assure that we take exactly the same random sample each time we run our code. To do that we use set.seed. This function initializes the random number generator (to a specific point in an infinite but static sequence of numbers). This is illustrated below.

set.seed(12)
runif(2)
## [1] 0.06936092 0.81777520
runif(3)
## [1] 0.9426217 0.2693819 0.1693481
runif(4)
## [1] 0.03389562 0.17878500 0.64166537 0.02287774
set.seed(12)
runif(1)
## [1] 0.06936092
runif(2)
## [1] 0.8177752 0.9426217
set.seed(12)
runif(3)
## [1] 0.06936092 0.81777520 0.94262173
runif(5)
## [1] 0.26938188 0.16934812 0.03389562 0.17878500 0.64166537

Note that after each time set.seed is called, the same sequence of random numbers was be generated. This is a very important feature, as it allows us to exactly reproduce results that involve random sampling. The seed number is arbitrary; a different seed number will give a different sequence.

set.seed(999)
runif(3)
## [1] 0.38907138 0.58306072 0.09466569
runif(5)
## [1] 0.85263123 0.78674676 0.11934226 0.60644699 0.08095691

The idea is that this will allow you to exactly reproduce results. By avoiding small amounts of variation between each time you run your code, you can be sure that all still works as before. You may wonder how to choose the value of the seed. You could take the date (e.g. “20210329”), but it should not really matter. If you notice that you data analysis gives materially different results besed on your choice of the seed, than you need to reconsider what you are doing, as your results are not stable (or potentially run it many times).

Matrices

Computation with matrices is also ‘vectorized’. For example, with matrix m you can do m * 5 to multiply all values of m3 with 5, or do m^2 or m * m to square the values of m.

# set up an example matrix
m <- matrix(1:6, ncol=3, nrow=2, byrow=TRUE)
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
m * 2
##      [,1] [,2] [,3]
## [1,]    2    4    6
## [2,]    8   10   12
m^2
##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36

We can also do math with a matrix and a vector. Note, again, that computation with matrices in R is column-wise, and that shorter vectors are recycled.

m * 1:2
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    8   10   12

Can you predict the result of this multiplication?

m * 1:4

You can multiply two matrices.

m * m
##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   16   25   36

Note that this is “cell by cell” multiplication. For ‘matrix multiplication’ in the mathematical sense, you need to use the %*% operator.

m %*% t(m)
##      [,1] [,2]
## [1,]   14   32
## [2,]   32   77