Fundamentals¶

Processes and patterns¶

This handout accompanies Chapter 4 in O’Sullivan and Unwin (2010) by working out the examples in R. Figure 4.2 (on page 96) shows values for a deterministic spatial process z = 2x + 3y. Below are two ways to create such a plot in R. The first one uses “base” R. That is, it does not use explicitly spatial objects (classes). I use the expand.grid function to create a matrix with two columns with all combinations of 0:7 with 0:7:

x <- 0:7
y <- 0:7
xy <- expand.grid(x, y)
colnames(xy) <- c('x', 'y')
head(xy)
##   x y
## 1 0 0
## 2 1 0
## 3 2 0
## 4 3 0
## 5 4 0
## 6 5 0

Now we can use these values to compute the values of z that correspond to the values of x (the first column of object xy) and y (the second column).

z <- 2*xy[,1] + 3*xy[,2]
zm <- matrix(z, ncol=8)

Here I do the same thing, but using a function of my own making (called detproc); just to get used to the idea of writing and using your own functions.

detproc <- function(x, y) {
  z <- 2*x + 3*y
  return(z)
}
v <- detproc(xy[,1], xy[,2])
zm <- matrix(v, ncol=8)

Below, I use a trick “plot(x, y, type=‘n’)” to set up a plot with the correct axes, that is otherwise blank. I do this because I do not want any dots on the plot. Instead of showing dots, I use the ‘text’ function to add the labels (as in the book).

plot(x, y, type='n')
text(xy[,1], xy[,2], z)
contour(x, y, zm, add=TRUE, lty=2)

Now, let’s do the same thing as above, but now by using a spatial data approach. Instead of a matrix, we use RasterLayer objects. First we create an empty raster with eight rows and columns, and with x and y going from 0 to 7. The ‘init’ function sets the values of the cells to either the x or the y coordinate (or something else, see ?init).

library(raster)
r <- raster(xmn=0, xmx=7, ymn=0, ymx=7, ncol=8, nrow=8)
X <- init(r, 'x')
Y <- init(r, 'y')
par(mfrow=c(1,2))
plot(X, main='x')
plot(Y, main='y')

We can use algebraic exprssions with RasterLayer objects

Z <- 2*X + 3*Y

Do you think it is possible to do Z <- detproc(X, Y)?

Plot the result.

plot(Z)
text(Z, cex=.75)
contour(Z, add=T, labcex=1, lwd=2, col='red')

The above does not seem very interesting. But, if the process is complex, a map of the outcome of a deterministic process can actually be very interesting. For example, in ecology and associated sciences there are many ‘mechanistic’ (= process) models that dynamically (= over time) simulate ecosystem processes such as vegetation dynamics and soil greenhouse gas emissions that depend on interaction of the weather, soil type, genotypes, and management; making it hard to predict the model outcome over space. In this context, stochasticity still exists through the input variables (e.g. rainfall). Many other models exist that have a deterministic and stochastic components (e.g. global climate models).

Below I follow the book by adding a stochastic element to the deterministic process about by adding variable ‘r’ to the equation: z = 2x + 3y + r; where r is a random value that can be -1 or +1. You will see that many examples of R functions use randomly generated values to illustrate how they work. But much real data analysis also depends on randomly selected variables, often as a ‘null model’ (such as CSR) to compare with an observed data set.

There are different ways to get random numbers, and you should pay attention to that. The sample function returns randomly selected values from a set that you provide (by default this is done without replacement). We need a random value for each cell of raster ‘r’, and assign these to a new RasterLayer with the same properties (spatial extent and resolution). When you work with using random values, the results will be different each time you run some code (that is the point); but sometimes it is desireable to recreate exactly the same random sequence. Function set.seed allows you to do that (after all, in computers we can only create pseudo-random values).

set.seed(987)
s <- sample(c(-1, 1), ncell(r), replace=TRUE)
s[1:8]
## [1] -1 -1 -1 -1  1 -1  1 -1
R <- setValues(r, s)
plot(R)

Now we can solve the formula and look at the result

Z <- 2*X + 3*Y + R
plot(Z)
text(Z, cex=.75)
contour(Z, add=T, labcex=1, lwd=2, col='red')

The figure above is a pattern from a (partly) random process. The process can generate other patterns, as is shown below. Because we want to repeat the same thing (process, code) a number of times, it is convenient to define a (pattern generating) function.

f <- function() {
    s <- sample(c(-1, 1), ncell(r), replace=TRUE)
    S <- setValues(r, s)
    Z <- 2*X + 3*Y + S
    return(Z)
}

We can call function f as many times as we like, below I use it four times. Note that the functions has no arguments, but we still need to use the parenthesis f() to distinguish it from f, the function definition.

set.seed(777)
par(mfrow=c(2,2), mai=c(0.5,0.5,0.5,0.5))
for (i in 1:4) {
    pattern <- f()
    plot(pattern)
    text(pattern, cex=.75)
    contour(pattern, add=TRUE, labcex=1, lwd=2, col='red')
}

As you can see, there is variation between the four plots, but not much. The deterministic process has an overriding influence as the random component only adds or subtracts a value of 1.

So far we have created regular, gridded, patterns. Locations of ‘events’ normally do not follow such a pattern (but they may be summarized that way). Here is how you can create simple dot maps of random events (following box “All the way: a chance map”; OSU page 98). I first create a function for a complete spatial random (CSR) process. Note the use of the runif (probably pronounced as r-unif as it stands for ‘random uniform’, there is also a rnorm, rpois, …) function to create the x and y coordinates. For convenience, this function also plots the value. That is not typical, as in many cases you may want to create many random draws, but not plot them all. Therefore I added the Boolean argument ‘plot’ (with default value FALSE) to the function.

csr <- function(n, r=99, plot=FALSE) {
    x <- runif(n, max=r)
    y <- runif(n, max=r)
    if (plot) {
        plot(x, y, xlim=c(0,r), ylim=c(0,r))
    }
}

Let’s run the function four times; to create four realizations. Again, I use set.seed to assure that the maps are always the same “random” draws.

set.seed(0)
par(mfrow=c(2,2), mai=c(.5, .5, .5, .5))
for (i in 1:4) {
    csr(50, plot=TRUE)
}

Predicting patterns¶

I first show how you can recreate Table 4.1 with R. Note the use of function ‘choose’ to get the ‘binomial coefficients’ from this formula.

\[\frac{n!}{k!(n-k)!} = \binom{n}{k}\]

Everything else is just basic math.

events <- 0:10
combinations <- choose(10, events)
prob1 <- (1/8)^events
prob2 <- (7/8)^(10-events)
Pk <- combinations * prob1 * prob2
d <- data.frame(events, combinations, prob1, prob2, Pk)
round(d, 8)
##    events combinations      prob1     prob2         Pk
## 1       0            1 1.00000000 0.2630756 0.26307558
## 2       1           10 0.12500000 0.3006578 0.37582225
## 3       2           45 0.01562500 0.3436089 0.24160002
## 4       3          120 0.00195312 0.3926959 0.09203810
## 5       4          210 0.00024414 0.4487953 0.02300953
## 6       5          252 0.00003052 0.5129089 0.00394449
## 7       6          210 0.00000381 0.5861816 0.00046958
## 8       7          120 0.00000048 0.6699219 0.00003833
## 9       8           45 0.00000006 0.7656250 0.00000205
## 10      9           10 0.00000001 0.8750000 0.00000007
## 11     10            1 0.00000000 1.0000000 0.00000000
sum(d$Pk)
## [1] 1

Table 4.1 explains how value for the binomial distribution can be computed. As this is a ‘well-known’ distribution (after all, it is the distribution you get when tossing a fair coin) there is a function to compute this directly.

b <- dbinom(0:10, 10, 1/8)
round(b, 8)
##  [1] 0.26307558 0.37582225 0.24160002 0.09203810 0.02300953 0.00394449
##  [7] 0.00046958 0.00003833 0.00000205 0.00000007 0.00000000

Similar functions exists for other commonly used distributions such as the uniform, normal, and Poisson distribution.

Now, let’s generate some quadrat counts and then compare the generated (observed) frequencies with the theoretical expectation. First the random points.

set.seed(1234)
x <- runif(50) * 99
y <- runif(50) * 99

Then the quadrats.

r <- raster(xmn=0, xmx=99, ymn=0, ymx=99, ncol=10, nrow=10)
quads <- rasterToPolygons(r)

And a plot to inspect them.

plot(x, y, xlim=c(0,99), ylim=c(0,99), col='red', pch=20, axes=F)
plot(quads, add=TRUE, border='gray')

A standard question is now to ask whether it is likely that this pattern was generated by random process. We can do this by comparing the observed frequencies with the theoretically expected frequencies. Note that in a coin toss the probability of success is 1/2; here the probability of success (the random chance that a point lands in quadrat is 1/(number of quadrats).

First I count the number of points by quadrat (grid cell).

xy <- cbind(x,y)
p <- rasterize(xy, r, fun='count', background=0)
plot(p)
plot(quads, add=TRUE, border='gray')
points(x, y, pch=20)

Then I get the frequency of the counts and make a barplot.

f <- freq(p)
f
##      value count
## [1,]     0    65
## [2,]     1    26
## [3,]     2     6
## [4,]     3     1
## [5,]     4     1
## [6,]     5     1
barplot(p)

To compare these observed values to the expected frequencies from the binomial distribution we can use expected <- dbinom(n, size, prob). In the book, this is P(k, n, x).

n <- 0:8
prob <- 1 / ncell(r)
size <- 50
expected <- dbinom(n, size, prob)
round(expected, 5)
## [1] 0.60501 0.30556 0.07562 0.01222 0.00145 0.00013 0.00001 0.00000 0.00000
plot(n, expected, cex=2, pch='x', col='blue')

These numbers indicate that you would expect that most quadrats would have a point count of zero, a few would have 1 point, and very few more than that. Six or more points in a single cell is highly unlikely to happen if this data generating process in spatially random.

m <- rbind(f[,2]/100, expected[1:nrow(f)])
bp <- barplot(m, beside=T, names.arg =1:nrow(f), space=c(0.1, 0.5),
   ylim=c(0,0.7), col=c('red', 'blue'))
text(bp, m, labels=round(m, 2), pos = 3, cex = .75)
legend(11, 0.7, c('Observed', 'Expected'), fill=c('red', 'blue'))

On page 106 it is discussed that the Poisson distribution can be a good approximation of the binomial distribution. Let’s get the expected values for the Poisson distribution. The intensity \(\lambda\) (lambda) is the number of points divided by the number of quadrats.

poisexp <- dpois(0:8, lambda=50/100)
poisexp
## [1] 6.065307e-01 3.032653e-01 7.581633e-02 1.263606e-02 1.579507e-03
## [6] 1.579507e-04 1.316256e-05 9.401827e-07 5.876142e-08
plot(expected, poisexp, cex=2)
abline(0,1)

Pretty much the same, indeed.

Random Lines¶

See pp 110-111. Here is a function that draws random lines through a rectangle. It first takes a random point within the rectangle, and a random angle. Then it uses basic trigonometry to find the line segments (where the line intersects with the rectangle). It returns the line length, or the coordinates of the intersection and random point, for plotting.

randomLineInRectangle <- function(xmn=0, xmx=0.8, ymn=0, ymx=0.6, retXY=FALSE) {
    x <- runif(1, xmn, xmx)
    y <- runif(1, ymn, ymx)
    angle <- runif(1, 0, 359.99999999)
    if (angle == 0) {
        # vertical line, tan is infinite
        if (retXY) {
            xy <- rbind(c(x, ymn), c(x, y), c(x, ymx))
            return(xy)
        }
        return(ymx - ymn)
    }
    tang <- tan(pi*angle/180)
    x1 <- max(xmn, min(xmx, x - y / tang))
    x2 <- max(xmn, min(xmx, x + (ymx-y) / tang))
    y1 <- max(ymn, min(ymx, y - (x-x1) * tang))
    y2 <- max(ymn, min(ymx, y + (x2-x) * tang))
    if (retXY) {
        xy <- rbind(c(x1, y1), c(x, y), c(x2, y2))
        return(xy)
    }
    sqrt((x2 - x1)^2 + (y2 - y1)^2)
}

We can test it:

randomLineInRectangle()
## [1] 0.4953701
randomLineInRectangle()
## [1] 0.4470846
set.seed(999)
plot(NA, xlim=c(0, 0.8), ylim=c(0, 0.6), xaxs="i", yaxs="i", xlab="", ylab="", las=1)
for (i in 1:4) {
    xy <- randomLineInRectangle(retXY=TRUE)
    lines(xy, lwd=2, col=i)
    #points(xy, cex=2, pch=20, col=i)
}

And plot the density function

r <- replicate(10000, randomLineInRectangle())
hist(r, breaks=seq(0,1,0.01), xlab="Line length", main="", freq=FALSE)

Note that the density function is similar to Fig 4.6, but not the same (e.g. for values near zero).

Sitting comfortably?¶

See the box on page 110-111. There are four seats on the table. Let’s number the seats 1, 2, 3, and 4. If two seats are occupied and the absolute difference between the seat numbers is 2, the customers sit straight across from each other. Otherwise, they sit across the corner from each other. What is the expected frequency of customers sitting straight across from each other? Let’s simulate.

set.seed(0)
x <- replicate(10000, abs(diff(sample(1:4, 2))))
sum(x==2) / length(x)
## [1] 0.3408

That is one third, as expected given that this represents two of the six ways you can sit.

Random areas¶

Here is how you can create a function to create a “random chessboard” as described on page 114.

r <- raster(xmn=0, xmx=1, ymn=0, ymx=1, ncol=8, nrow=8)
p <- rasterToPolygons(r)
chess <- function() {
    s <- sample(c(-1, 1), 64, replace=TRUE)
    values(r) <- s
    plot(r, col=c('black', 'white'), legend=FALSE, axes=FALSE, box=FALSE)
    plot(p, add=T, border='gray')
}

And create four realizations:

set.seed(0)
par(mfrow=c(2,2), mai=c(0.2, 0.1, 0.2, 0.1))
for (i in 1:4) {
    chess()
}

This is how you can create a random field (page 114/115)

r <- raster(xmn=0, xmx=1, ymn=0, ymx=1, ncol=20, nrow=20)
values(r) <- rnorm(ncell(r), 0, 2)
plot(r)
contour(r, add=T)

But a more realistic random field will have some spatial autocorrelation. We can create that with the focal function.

ra <- focal(r, w=matrix(1/9, nc=3, nr=3), na.rm=TRUE, pad=TRUE)
ra <- focal(ra, w=matrix(1/9, nc=3, nr=3), na.rm=TRUE, pad=TRUE)
plot(ra)

Questions

Use the examples provided above to write a script that follows the ‘thought exercise to fix ideas’ on page 98 of OSU.
Use the example of the CSR points maps to write a script that uses a normal distribution, rather than a random uniform distribution (also see box ‘different distributions’; on page 99 of OSU).
Do the generated chess boards look (positively or negatively) spatially autocorrelated?
How would you, conceptually, statistically test whether the real chessboard used in games is generated by an independent random process?
(bonus) Can you explain the odd distribution pattern of random line lengths inside a rectangle?