5. Point pattern analysis¶

5.1 Introduction¶

Get the data¶

We are using a dataset of crimes in a city. You can download the data here: crime, city, census, or with the script below.

dir.create('data', showWarnings = FALSE)
files <- c('city.rds', 'crime.rds', 'census2000.rds')
for (filename in files) {
localfile <- file.path('data', filename)
if (!file.exists(localfile)) {
}
}


library(raster)
## Warning in readRDS("data/city.rds"): invalid or incomplete compressed data


Here is a map of both datasets.

plot(city, col='light blue')
## Error in plot(city, col = "light blue"): object 'city' not found
points(crime, col='red', cex=.5, pch='+')
## Error in points(crime, col = "red", cex = 0.5, pch = "+"): object 'crime' not found


A sorted table of the incidence of crime types.

tb <- sort(table(crime$CATEGORY))[-1] ## Error in table(crime$CATEGORY): object 'crime' not found
tb


Let’s get the coordinates of the crime data, and for this exercise, remove duplicate crime locations. These are the ‘events’ we will use below (later we’ll go back to the full data set).

xy <- coordinates(crime)
dim(xy)
xy <- unique(xy)
dim(xy)


5.2 Basic statistics¶

Compute the mean center and standard distance for the crime data (see page 125 of OSU).

# mean center
mc <- apply(xy, 2, mean)
# standard distance
sd <- sqrt(sum((xy[,1] - mc[1])^2 + (xy[,2] - mc[2])^2) / nrow(xy))


Plot the data to see what we’ve got. I add a summary circle (as in Fig 5.2) by dividing the circle in 360 points and compute bearing in radians. I do not think this is particularly helpful, but it might be in other cases. And it is always fun to figure out how to do tis.

plot(city, col='light blue')
## Error in plot(city, col = "light blue"): object 'city' not found
points(crime, cex=.5)
points(cbind(mc[1], mc[2]), pch='*', col='red', cex=5)

# make a circle
bearing <- 1:360 * pi/180
cx <- mc[1] + sd * cos(bearing)
cy <- mc[2] + sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='red', lwd=2)
## Error in lines(circle, col = "red", lwd = 2): object 'circle' not found


5.3 Density¶

Here is a basic approach to computing point density.

CityArea <- area(city)
dens <- nrow(xy) / CityArea


Question 1a:What is the unit of ‘dens’?

Question 1b:What is the number of crimes per square km?

To compute quadrat counts (as on p.127-130) I first create quadrats (a RasterLayer). I get the extent for the raster from the city polygon, and then assign an an arbitrary resolution of 1000. (In real life one should always try a range of resolutions, I think).

r <- raster(city)
res(r) <- 1000
r


To find the cells that are in the city, and for easy display, I create polygons from the RasterLayer.

r <- rasterize(city, r)
plot(r)
points(crime, col='red', cex=.5)
## Error in points(crime, col = "red", cex = 0.5): object 'crime' not found


The number of events in each quadrat can be counted using the ‘rasterize’ function. That function can be used to summarize the number of points within each cell, but also to compute statistics based on the ‘marks’ (attributes). For example we could compute the number of different crime types) by changing the ‘fun’ argument to another function (see ?rasterize).

nc <- rasterize(coordinates(crime), r, fun='count', background=0)
plot(nc)


nc has crime counts. As we only have data for the city, the areas outside of the city need to be excluded. We can do that with the mask function (see ?mask).

ncrimes <- mask(nc, r)
plot(ncrimes)


Better. Now the frequencies.

f <- freq(ncrimes, useNA='no')
plot(f, pch=20)


Does this look like a pattern you would have expected? Now compute average number of cases per quadrat.

# number of quadrats
# number of cases
cases <- sum(f[,1] * f[,2])
mu


And create a table like Table 5.1 on page 130

ff <- data.frame(f)
colnames(ff) <- c('K', 'X')
## Error in colnames<-(*tmp*, value = c("K", "X")): attempt to set 'colnames' on an object with less than two dimensions
ff$Kmu <- ff$K - mu
## Error in ff$K:$ operator is invalid for atomic vectors
ff$Kmu2 <- ff$Kmu^2
## Error in ff$Kmu:$ operator is invalid for atomic vectors
ff$XKmu2 <- ff$Kmu2 * ff$X ## Error in ff$Kmu2: $operator is invalid for atomic vectors head(ff) ## [1] "Chapter5.rmd" "Chapter7.rmd" "Chapter8.rmd" "Chapter9.rmd"  The observed variance s2 is s2 <- sum(ff$XKmu2) / (sum(ff$X)-1) ## Error in ff$XKmu2: $operator is invalid for atomic vectors s2 ## Error in eval(expr, envir, enclos): object 's2' not found  And the VMR is VMR <- s2 / mu ## Error in eval(expr, envir, enclos): object 's2' not found VMR ## Error in eval(expr, envir, enclos): object 'VMR' not found  Question 2:What does this VMR score tell us about the point pattern? 5.3 Distance based measures¶ As we are using a planar coordinate system we can use the dist function to compute the distances between pairs of points. Contrary to what the books says, if we were using longitude/latitude we could compute distance via spherical trigonometry functions. These are available in the sp, raster, and notably the geosphere package (among others). For example, see raster::pointDistance. d <- dist(xy) ## Error in as.matrix(x): object 'xy' not found class(d) ## Error in eval(expr, envir, enclos): object 'd' not found  I want to coerce the dist object to a matrix, and ignore distances from each point to itself (the zeros on the diagonal). dm <- as.matrix(d) ## Error in as.matrix(d): object 'd' not found dm[1:5, 1:5] ## Error in eval(expr, envir, enclos): object 'dm' not found diag(dm) <- NA ## Error in diag(dm) <- NA: object 'dm' not found dm[1:5, 1:5] ## Error in eval(expr, envir, enclos): object 'dm' not found  To get, for each point, the minimum distance to another event, we can use the ‘apply’ function. Think of the rows as each point, and the columns of all other points (vice versa could also work). dmin <- apply(dm, 1, min, na.rm=TRUE) ## Error in apply(dm, 1, min, na.rm = TRUE): object 'dm' not found head(dmin) ## Error in head(dmin): object 'dmin' not found  Now it is trivial to get the mean nearest neighbour distance according to formula 5.5, page 131. mdmin <- mean(dmin) ## Error in mean(dmin): object 'dmin' not found  Do you want to know, for each point, Which point is its nearest neighbour? Use the ‘which.min’ function (but note that this ignores the possibility of multiple points at the same minimum distance). wdmin <- apply(dm, 1, which.min) ## Error in apply(dm, 1, which.min): object 'dm' not found  And what are the most isolated cases? That is the furtest away from their nearest neigbor. I plot the top 25. A bit complicated. plot(city) ## Error in plot(city): object 'city' not found points(crime, cex=.1) ## Error in points(crime, cex = 0.1): object 'crime' not found ord <- rev(order(dmin)) ## Error in order(dmin): object 'dmin' not found far25 <- ord[1:25] ## Error in eval(expr, envir, enclos): object 'ord' not found neighbors <- wdmin[far25] ## Error in eval(expr, envir, enclos): object 'wdmin' not found points(xy[far25, ], col='blue', pch=20) ## Error in points(xy[far25, ], col = "blue", pch = 20): object 'xy' not found points(xy[neighbors, ], col='red') ## Error in points(xy[neighbors, ], col = "red"): object 'xy' not found # drawing the lines, easiest via a loop for (i in far25) { lines(rbind(xy[i, ], xy[wdmin[i], ]), col='red') } ## Error in eval(expr, envir, enclos): object 'far25' not found  Note that some points, but actually not that many, are used as isolated and as a neighbor to an isolated points. Now on to the G function max(dmin) ## Error in eval(expr, envir, enclos): object 'dmin' not found # get the unique distances (for the x-axis) distance <- sort(unique(round(dmin))) ## Error in unique(round(dmin)): object 'dmin' not found # compute how many cases there with distances smaller that each x Gd <- sapply(distance, function(x) sum(dmin < x)) ## Error in FUN(X[[i]], ...): object 'dmin' not found # normalize to get values between 0 and 1 Gd <- Gd / length(dmin) ## Error in eval(expr, envir, enclos): object 'Gd' not found plot(distance, Gd) ## Error in plot(distance, Gd): object 'Gd' not found # using xlim to exclude the extremes plot(distance, Gd, xlim=c(0,500)) ## Error in plot(distance, Gd, xlim = c(0, 500)): object 'Gd' not found  Here is a function to show these values in a more standard way. stepplot <- function(x, y, type='l', add=FALSE, ...) { x <- as.vector(t(cbind(x, c(x[-1], x[length(x)])))) y <- as.vector(t(cbind(y, y))) if (add) { lines(x,y, ...) } else { plot(x,y, type=type, ...) } }  And use it for our G function data. stepplot(distance, Gd, type='l', lwd=2, xlim=c(0,500)) ## Error in x[-1]: object of type 'closure' is not subsettable  The steps are so small in our data, that you hardly see the difference. I use the centers of previously defined raster cells to compute the F function. # get the centers of the 'quadrats' (raster cells) p <- rasterToPoints(r) ## Error in nlayers(x): object 'r' not found # compute distance from all crime sites to these cell centers d2 <- pointDistance(p[,1:2], xy, longlat=FALSE) ## Error in .pointsToMatrix(p1): object 'p' not found # the remainder is similar to the G function Fdistance <- sort(unique(round(d2))) ## Error in unique(round(d2)): object 'd2' not found mind <- apply(d2, 1, min) ## Error in apply(d2, 1, min): object 'd2' not found Fd <- sapply(Fdistance, function(x) sum(mind < x)) ## Error in lapply(X = X, FUN = FUN, ...): object 'Fdistance' not found Fd <- Fd / length(mind) ## Error in eval(expr, envir, enclos): object 'Fd' not found plot(Fdistance, Fd, type='l', lwd=2, xlim=c(0,3000)) ## Error in plot(Fdistance, Fd, type = "l", lwd = 2, xlim = c(0, 3000)): object 'Fdistance' not found  Compute the expected distributon (5.12 on page 145) ef <- function(d, lambda) { E <- 1 - exp(-1 * lambda * pi * d^2) } expected <- ef(0:2000, dens) ## Error in ef(0:2000, dens): object 'dens' not found  Now, let’s combine F and G on one plot. plot(distance, Gd, type='l', lwd=2, col='red', las=1, ylab='F(d) or G(d)', xlab='Distance', yaxs="i", xaxs="i") ## Error in plot(distance, Gd, type = "l", lwd = 2, col = "red", las = 1, : object 'Gd' not found lines(Fdistance, Fd, lwd=2, col='blue') ## Error in lines(Fdistance, Fd, lwd = 2, col = "blue"): object 'Fdistance' not found lines(0:2000, expected, lwd=2) ## Error in xy.coords(x, y): object 'expected' not found legend(1200, .3, c(expression(italic("G")["d"]), expression(italic("F")["d"]), 'expected'), lty=1, col=c('red', 'blue', 'black'), lwd=2, bty="n") ## Error in strwidth(legend, units = "user", cex = cex, font = text.font): plot.new has not been called yet  Question 3: What does this plot suggest about the point pattern? Finally, let’s compute K. Note that I use the original distance matrix ‘d’ here. distance <- seq(1, 30000, 100) Kd <- sapply(distance, function(x) sum(d < x)) # takes a while ## Error in FUN(X[[i]], ...): object 'd' not found Kd <- Kd / (length(Kd) * dens) ## Error in eval(expr, envir, enclos): object 'Kd' not found plot(distance, Kd, type='l', lwd=2) ## Error in plot(distance, Kd, type = "l", lwd = 2): object 'Kd' not found  Question 4: Create a single random pattern of events for the city, with the same number of events as the crime data (object xy). Use function ‘spsample’ Question 5: Compute the G function, and plot it on a single plot, together with the G function for the observed crime data, and the theoretical expectation (formula 5.12). Question 6: (Difficult!) Do a Monte Carlo simulation (page 149) to see if the ‘mean nearest distance’ of the observed crime data is significantly different from a random pattern. Use a ‘for loop’. First write ‘pseudo-code’. That is, say in natural language what should happen. Then try to write R code that implements this. 5.5 Spatstat package¶ Above we did some ‘home-brew’ point pattern analysis, we will now use the spatstat package. In research you would normally use spatstat rather than your own functions, at least for standard analysis. I showed how you make some of these functions in the previous sections, because understanding how to go about that may allow you to take things in directions that others have not gone. The good thing about spatstat is that it very well documented (see http://spatstat.github.io/). The bad thing is that it uses an entirly different sets of classes (ways to represent spatial data) that we we will use in all other labs (classes from sp and raster); but it is not hard to get used to that. library(spatstat)  We start with making make a Kernel Density raster. I first create a ‘ppp’ (point pattern) object, as defined in the spatstat package. A ppp object has the coordinates of the points and the analysis ‘window’ (study region). To assign the points locations we need to extract the coordinates from our SpatialPoints object. To set the window, we first need to to coerce our SpatialPolygons into an ‘owin’ object. We need a function from the maptools package for this coercion. Coerce from SpatialPolygons to an object of class “owin” (observation window) library(maptools) cityOwin <- as.owin(city) ## Error in as.owin(city): object 'city' not found class(cityOwin) ## Error in eval(expr, envir, enclos): object 'cityOwin' not found cityOwin ## Error in eval(expr, envir, enclos): object 'cityOwin' not found  Extract coordinates from SpatialPointsDataFrame: pts <- coordinates(crime) ## Error in coordinates(crime): object 'crime' not found head(pts) ## Error in head(pts): object 'pts' not found  Now we can create a ‘ppp’ (point pattern) object p <- ppp(pts[,1], pts[,2], window=cityOwin) ## Error in verifyclass(window, "owin"): object 'cityOwin' not found class(p) ## Error in eval(expr, envir, enclos): object 'p' not found p ## Error in eval(expr, envir, enclos): object 'p' not found plot(p) ## Error in plot(p): object 'p' not found  Note the warning message about ‘illegal’ points. Do you see them and do you understand why they are illegal? Having all the data well organized, it is now easy to compute Kernel Density ds <- density(p) ## Error in density(p): object 'p' not found class(ds) ## Error in eval(expr, envir, enclos): object 'ds' not found plot(ds, main='crime density') ## Error in plot(ds, main = "crime density"): object 'ds' not found  Density is the number of points per unit area. Let’s ceck if the numbers makes sense, by adding them up and mulitplying with the area of the raster cells. I use raster package functions for that. nrow(pts) ## Error in nrow(pts): object 'pts' not found r <- raster(ds) ## Error in raster(ds): object 'ds' not found s <- sum(values(r), na.rm=TRUE) ## Error in values(r): object 'r' not found s * prod(res(r)) ## Error in eval(expr, envir, enclos): object 's' not found  Looks about right. We can also get the information directly from the “im” (image) object str(ds) ## Error in str(ds): object 'ds' not found sum(ds$v, na.rm=TRUE) * ds$xstep * ds$ystep
p$n ## Error in eval(expr, envir, enclos): object 'p' not found  Here’s another, lenghty, example of generalization. We can interpolate population density from (2000) census data; assigning the values to the centroid of a polygon (as explained in the book, but not a great technique). We use a shapefile with census data. census <- readRDS("data/census2000.rds") ## Error in readRDS("data/census2000.rds"): error reading from connection  To compute population density for each census block, we first need to get the area of each polygon. I transform density from persons per feet2 to persons per mile2, and then compute population density from POP2000 and the area census$area <- area(census)
census$area <- census$area/27878400
census$dens <- census$POP2000 / census$area ## Error in eval(expr, envir, enclos): object 'census' not found  Now to get the centroids of the census blocks we can use the ‘coordinates’ function again. Note that it actually does something quite different (with a SpatialPolygons* object) then in the case above (with a SpatialPoints* object). p <- coordinates(census) ## Error in coordinates(census): object 'census' not found head(p) ## Error in head(p): object 'p' not found  To create the ‘window’ we dissolve all polygons into a single polygon. win <- aggregate(census) ## Error in aggregate(census): object 'census' not found  Let’s look at what we have: plot(census) ## Error in plot(census): object 'census' not found points(p, col='red', pch=20, cex=.25) ## Error in points(p, col = "red", pch = 20, cex = 0.25): object 'p' not found plot(win, add=TRUE, border='blue', lwd=3) ## Error in plot(win, add = TRUE, border = "blue", lwd = 3): object 'win' not found  Now we can use ‘smooth.ppp’ to interpolate. Population density at the points is referred to as the ‘marks’ owin <- as.owin(win) ## Error in as.owin(win): object 'win' not found pp <- ppp(p[,1], p[,2], window=owin, marks=census$dens)
## Error in verifyclass(window, "owin"): argument 'window' is not of class 'owin'
pp


Note the warning message: “1 point was rejected as lying outside the specified window”. That is odd, there is a polygon that has a centroid that is outside of the polygon. This can happen with, e.g., kidney shaped polygons.

Let’s find and remove this point that is outside the study area.

sp <- SpatialPoints(p, proj4string=CRS(proj4string(win)))
i <- gIntersects(sp, win, byid=TRUE)
## Error in gIntersects(sp, win, byid = TRUE): could not find function "gIntersects"
which(!i)
## integer(0)


Let’s see where it is:

plot(census)
points(sp)
points(sp[!i,], col='red', cex=3, pch=20)
## Error in points(sp[!i, ], col = "red", cex = 3, pch = 20): object 'sp' not found


You can zoom in using the code below. After running the next line, click on your map twice to zoom to the red dot, otherwise you cannot continue:

zoom(census)

And add the red points again

points(sp[!i,], col='red')

To only use points that intersect with the window polygon, that is, where ‘i == TRUE’:

pp <- ppp(p[i,1], p[i,2], window=owin, marks=census$dens[i]) ## Error in verifyclass(window, "owin"): argument 'window' is not of class 'owin' plot(pp) ## Error in plot(pp): object 'pp' not found plot(city, add=TRUE) ## Error in plot(city, add = TRUE): object 'city' not found  And to get a smooth interpolation of population density. s <- smooth.ppp(pp) ## Error in smooth.ppp(pp): could not find function "smooth.ppp" plot(s) ## Error in plot(s): object 's' not found plot(city, add=TRUE) ## Error in plot(city, add = TRUE): object 'city' not found  Population density could establish the “population at risk” (to commit a crime) for certain crimes, but not for others. Maps with the city limits and the incidence of ‘auto-theft’, ‘drunk in public’, ‘DUI’, and ‘Arson’. par(mfrow=c(2,2), mai=c(0.25, 0.25, 0.25, 0.25)) for (offense in c("Auto Theft", "Drunk in Public", "DUI", "Arson")) { plot(city, col='grey') acrime <- crime[crime$CATEGORY == offense, ]
points(acrime, col = "red")
title(offense)
}


Create a marked point pattern object (ppp) for all crimes. It is important to coerce the marks to a factor variable.

crime$fcat <- as.factor(crime$CATEGORY)
## Error in as.factor(crime$CATEGORY): object 'crime' not found w <- as.owin(city) ## Error in as.owin(city): object 'city' not found xy <- coordinates(crime) ## Error in coordinates(crime): object 'crime' not found mpp <- ppp(xy[,1], xy[,2], window = w, marks=crime$fcat)


We can split the mpp object by category (crime)

spp <- split(mpp)
plot(spp[1:4], main='')


The crime density by category:

plot(density(spp[1:4]), main='')


And produce K-plots (with an envelope) for ‘drunk in public’ and ‘Arson’. Can you explain what they mean?

spatstat.options(checksegments = FALSE)
ktheft <- Kest(spp$"Auto Theft") ## Error in verifyclass(X, "ppp"): object 'spp' not found ketheft <- envelope(spp$"Auto Theft", Kest)
## Error in envelope(spp$"Auto Theft", Kest): object 'spp' not found ktheft <- Kest(spp$"Arson")
ketheft <- envelope(spp$"Arson", Kest) ## Error in envelope(spp$Arson, Kest): object 'spp' not found

par(mfrow=c(1,2))
plot(ktheft)
plot(ketheft)


Let’s try to answer the question you have been wanting to answer all along. Is population density a good predictor of being (booked for) “drunk in public” and for “Arson”? One approach is to do a Kolmogorov-Smirnov (‘kstest’) on ‘Drunk in Public’ and ‘Arson’, using population density as a covariate:

KS.arson <- cdf.test(spp$Arson, covariate=ds, test='ks') ## Error in cdf.test(spp$Arson, covariate = ds, test = "ks"): object 'spp' not found
KS.arson
KS.drunk <- cdf.test(spp$'Drunk in Public', covariate=ds, test='ks') ## Error in cdf.test(spp$"Drunk in Public", covariate = ds, test = "ks"): object 'spp' not found
KS.drunk

kc <- Kcross(mpp, i = "Drunk in Public", j = "Arson")