Unsupervised Classification¶
In this chapter we explore unsupervised classification. Various unsupervised classification algorithms exist, and the choice of algorithm affects the results. Here we use the k-means algorithm to illustrate the general principle.
For this example, we will follow the National Land Cover Database 2011 (NLCD 2011) classification scheme for a subset of the Central Valley regions. We use cloud-free composite image from Landsat 5 with 6 bands.
library(terra)
## terra 1.7.62
landsat5 <- rast('data/rs/centralvalley-2011LT5.tif')
names(landsat5) <- c('blue', 'green', 'red', 'NIR', 'SWIR1', 'SWIR2')
Question 1: Make a 3-band False Color Composite plot of ``landsat5``.
In unsupervised classification, we use the reflectance data, but we don’t supply any response data (that is, we do not identify any pixel as belonging to a particular class). This may seem odd, but it can be useful when we don’t have much prior knowledge of a study area. Or if you have broad knowledge of the distribution of land cover classes of interest, but no specific ground data.
The algorithm groups pixels with similar spectral characteristics into groups.
Learn more about K-means and other unsupervised-supervised algorithms here.
We will perform unsupervised classification on a spatial subset of the
ndvi
layer. Here is yet another way to compute ndvi. In this case we
do not use a separate function, but we use a direct algebraic notation.
ndvi <- (landsat5[['NIR']] - landsat5[['red']]) / (landsat5[['NIR']] + landsat5[['red']])
We will do kmeans
clustering of the ndvi
data. First we use
crop
to make a spatial subset of the ndvi
, to allow for faster
processing (you can select any SpatExtent
using the draw
function).
kmeans classification¶
# SpatExtent to crop ndvi layer
e <- ext(-121.807, -121.725, 38.004, 38.072)
# crop landsat by the extent
ndvi <- crop(ndvi, e)
ndvi
## class : SpatRaster
## dimensions : 252, 304, 1 (nrow, ncol, nlyr)
## resolution : 0.0002694946, 0.0002694946 (x, y)
## extent : -121.807, -121.725, 38.00413, 38.07204 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source(s) : memory
## varname : centralvalley-2011LT5
## name : NIR
## min value : -0.3360085
## max value : 0.7756007
# convert the raster to a data.frme
nr <- as.data.frame(ndvi, cell=TRUE)
str(nr)
## 'data.frame': 76608 obs. of 2 variables:
## $ cell: int 1 2 3 4 5 6 7 8 9 10 ...
## $ NIR : num 0.245 0.236 0.272 0.277 0.277 ...
Please note that values(ndvi)
converted the ndvi
SpatRaster to
an array (matrix). Now we will perform the kmeans
clustering on the
matrix and inspect the output.
# It is important to set the seed generator because `kmeans` initiates the centers in random locations
set.seed(99)
# Create 10 clusters, allow 500 iterations, start with 5 random sets using "Lloyd" method.
# Do not use the first column (cell number).
kmncluster <- kmeans(nr[,-1], centers=10, iter.max = 500, nstart = 5, algorithm="Lloyd")
# kmeans returns an object of class "kmeans"
str(kmncluster)
## List of 9
## $ cluster : int [1:76608] 4 4 3 3 3 3 3 4 4 4 ...
## $ centers : num [1:10, 1] 0.55425 0.00498 0.29997 0.20892 -0.20902 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "1" "2" "3" "4" ...
## .. ..$ : NULL
## $ totss : num 6459
## $ withinss : num [1:10] 5.69 6.13 4.91 4.9 5.75 ...
## $ tot.withinss: num 55.8
## $ betweenss : num 6403
## $ size : int [1:10] 8932 4550 7156 6807 11672 8624 8736 5040 9893 5198
## $ iter : int 108
## $ ifault : NULL
## - attr(*, "class")= chr "kmeans"
kmeans
returns an object with 9 elements. The length of the
cluster
element within kmncluster
is 76608 which same as length
of nr
created from the ndvi
. The cell values of
kmncluster$cluster
range between 1 to 10 corresponding to the input
number of cluster we provided in the kmeans
function.
kmncluster$cluster
indicates the cluster label for corresponding
pixel. We need to convert the kmncluster$cluster
values back to a
SpatRaster
of the same dimension as the ndvi
.
# Use the ndvi object to set the cluster values to a new raster
knr <- rast(ndvi, nlyr=1)
knr[nr$cell] <- kmncluster$cluster
knr
## class : SpatRaster
## dimensions : 252, 304, 1 (nrow, ncol, nlyr)
## resolution : 0.0002694946, 0.0002694946 (x, y)
## extent : -121.807, -121.725, 38.00413, 38.07204 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source(s) : memory
## varname : centralvalley-2011LT5
## name : NIR
## min value : 1
## max value : 10
We can see that knr
is a SpatRaster but we do not know which cluster
(1-10) belongs to what land cover class (and if it does belong to a
class that we would recognize). You can find that out by plotting them
side-by-side with a reference layers and using unique color for each
cluster.
# Define a color vector for 10 clusters (learn more about setting the color later)
mycolor <- c("#fef65b","#ff0000", "#daa520","#0000ff","#0000ff","#00ff00","#cbbeb5",
"#c3ff5b", "#ff7373", "#00ff00", "#808080")
par(mfrow = c(1,2))
plot(ndvi, col = rev(terrain.colors(10)), main = "Landsat-NDVI")
plot(knr, main = 'Unsupervised classification', col = mycolor, type="classes")
While for other purposes it is usually better to define more classes (and possibly merge classes later), a simple classification like this one could be useful, e.g., merge cluster 4 and 5 to construct a water mask for the year 2011.
You can change the colors in my mycolor
. Learn more about selecting
colors in R
here
and
here.
Question 2:Plot a true-color image of ‘landsat5’ for the subset (extent ‘e’) and result of ‘kmeans’ clustering side-by-side and make a table of land-use land-cover labels for the clusters (based on visual inspection). E.g. cluster 4 and 5 are water.