4. Unsupervised Image Classification

Introduction

In this chapter we explore unsupervised classification. Various unsupervised classification algorithms exist, and the choice of algorithm can affect the results. We will explore only one algorithm (k-means) to illustrate the general principle.

For this example, we will follow the National Land Cover Database 2011 (NLCD 2011) classification scheme for a subset of the Central Valley regions. You will use cloud-free composite image from Landsat 5 with 6 bands.

library(raster)
landsat5 <- stack("data/rs/centralvalley-2011LT5.tif")
names(landsat5) <- c("blue", "green", "red", "NIR", "SWIR1",
    "SWIR2")

Exercise 1 Make a 3-band False Color Composite plot of landsat5.

In unsupervised classification, we don’t supply any training data. This particularly is useful when we don’t have prior knowledge of the study area. The algorithm groups pixels with similar spectral characteristics into unique clusters/classes/groups following some statistically determined conditions (e.g. minimizing mean square root error within each cluster). You have to re-label and combine these spectral clusters into information classes (for e.g. land-use land-cover). Unsupervised algorithms are often referred as clustering.

To get satisfactory results from unsupervised classification, a good practice is to start with large number of centers (more clusters) and merge/group/recode similar clusters by inspecting the original imagery.

Learn more about K-means and other unsupervised-supervised algorithms here.

We will perform unsupervised classification on a spatial subset of the ndvi layer.

Now we will perform unsupervised kmeans clustering on the ndvi layer. First we use crop to make a spatial subset of the ndvi layer. The smaller spatial extent allows for faster processing (you can select any extent using the drawExtent() function).

kmeans classification

# Extent to crop ndvi layer
e <- extent(-121.807, -121.725, 38.004, 38.072)

# crop landsat by the extent
ndvi <- crop(ndvi, e)
ndvi
## class       : RasterLayer
## dimensions  : 252, 304, 76608  (nrow, ncol, ncell)
## resolution  : 0.0002694946, 0.0002694946  (x, y)
## extent      : -121.807, -121.725, 38.00413, 38.07204  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names       : layer
## values      : -0.3360085, 0.7756007  (min, max)

# convert the raster to vecor/matrix
nr <- getValues(ndvi)
str(nr)
##  num [1:76608] 0.245 0.236 0.272 0.277 0.277 ...

Please note that getValues converted the ndvi RasterLayer to array. Now we will perform the kmeans clustering on the array object and inspect the output.

# It is important to set the seed generator because `kmeans`
# initiates the centers in random locations
set.seed(99)

# We want to create 10 clusters, allow 500 iterations, start
# with 5 random sets using 'Lloyd' method
kmncluster <- kmeans(na.omit(nr), centers = 10, iter.max = 500,
    nstart = 5, algorithm = "Lloyd")

# kmeans returns an object of class 'kmeans'
str(kmncluster)
## List of 9
##  $ cluster     : int [1:76608] 6 6 3 3 3 3 3 6 6 6 ...
##  $ centers     : num [1:10, 1] 0.38986 0.00498 0.29997 -0.14263 -0.20902 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "1" "2" "3" "4" ...
##   .. ..$ : NULL
##  $ totss       : num 6459
##  $ withinss    : num [1:10] 5.18 6.13 4.91 6.11 5.75 ...
##  $ tot.withinss: num 55.8
##  $ betweenss   : num 6403
##  $ size        : int [1:10] 8736 4550 7156 8624 11672 6807 8932 5198 5040 9893
##  $ iter        : int 221
##  $ ifault      : NULL
##  - attr(*, "class")= chr "kmeans"

kmeans returns an object with 9 elements. The length of the cluster element within kmncluster is 76608 which same as length of nr created from the ndvi. The cell values of kmncluster$cluster range between 1 to 10 corresponding to the input number of cluster we provided in the kmeans function. kmncluster$cluster indicates the cluster label for corresponding pixel. We need to convert the kmncluster$cluster array back to RasterLayer of the same dimension as the ndvi.

# First create a copy of the ndvi layer
knr <- ndvi

# Now replace raster cell values with kmncluster$cluster
# array
knr[] <- kmncluster$cluster

# Alternative
values(knr) <- kmncluster$cluster
knr
## class       : RasterLayer
## dimensions  : 252, 304, 76608  (nrow, ncol, ncell)
## resolution  : 0.0002694946, 0.0002694946  (x, y)
## extent      : -121.807, -121.725, 38.00413, 38.07204  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names       : layer
## values      : 1, 10  (min, max)

We can see that knr is a RasterLayer but we do not know which cluster (1-10) belongs what LULC class. You can find that out by plotting them side-by-side with other reference layer and using unique color for each cluster.

# Define a color vector for 10 clusters (learn more about
# setting the color later)
mycolor <- c("#fef65b", "#ff0000", "#daa520", "#0000ff", "#0000ff",
    "#00ff00", "#cbbeb5", "#c3ff5b", "#ff7373", "#00ff00", "#808080")

par(mfrow = c(1, 2))
plot(ndvi, col = rev(terrain.colors(10)), main = "Landsat-NDVI")
plot(knr, main = "Unsupervised classification", col = mycolor)

image0

While for other purposes it is usually better to define more classes (and possibly fuse classes later), a simple classification like this one could be useful, e.g., merge cluster 4 and 5 to construct a water mask for the year 2011.

Important These 10 clusters are not useful unless you re-label them with LULC information class. You can save the raster output of clustering and open it in GIS platform (e.g. QGIS) and assign classes to each clusters. Multiple clusters can have the same class! A tutorial can be found here. You can skip step 1 in the tutorial which provides guidelines for generating clusters.

You can change the colors in my mycolor. Learn more about selecting colors in R here and here.

Exercise Plot 3-band RGB of landsat5 for the subset (extent e) and result of kmeans clustering side-by-side and make a table of land-use land-cover labels for the clusters. E.g. cluster 4 and 5 are water.