# Vector data manipulation¶

This chapter illustrates some ways in which we can manipulate vector data. We start with an example SpatVector that we read from a shapefile.

```library(terra)
f <- system.file("ex/lux.shp", package="terra")
p <- vect(f)
p
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 12, 6  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  source      : lux.shp
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
##  type        : <num>    <chr> <num>    <chr> <num> <int>
##  values      :     1 Diekirch     1 Clervaux   312 18081
##                    1 Diekirch     2 Diekirch   218 32543
##                    1 Diekirch     3  Redange   259 18664
```

We can plot these data in many ways. For example:

```plot(p, "NAME_2")
```

plot of chunk unnamed-chunk-1

## Basics¶

### Geometry and attributes¶

To extract the attributes (data.frame) from a SpatVector, use:

```d <- as.data.frame(p)
```
```##   ID_1       NAME_1 ID_2     NAME_2 AREA   POP
## 1    1     Diekirch    1   Clervaux  312 18081
## 2    1     Diekirch    2   Diekirch  218 32543
## 3    1     Diekirch    3    Redange  259 18664
## 4    1     Diekirch    4    Vianden   76  5163
## 5    1     Diekirch    5      Wiltz  263 16735
## 6    2 Grevenmacher    6 Echternach  188 18899
```

You can also extract the geometry as a a matrix (this is rarely needed).

```g <- geom(p)
```
```##      geom part        x        y hole
## [1,]    1    1 6.026519 50.17767    0
## [2,]    1    1 6.031361 50.16563    0
## [3,]    1    1 6.035646 50.16410    0
## [4,]    1    1 6.042747 50.16157    0
## [5,]    1    1 6.043894 50.16116    0
## [6,]    1    1 6.048243 50.16008    0
```

Or as “well-known-text”

```g <- geom(p, wkt=TRUE)
substr(g, 1, 50)
```
```##  [1] "POLYGON ((6.026519 50.17767, 6.031361 50.165627, 6"
##  [2] "POLYGON ((6.178368 49.876823, 6.185479 49.870525, "
##  [3] "POLYGON ((5.881378 49.870148, 5.881672 49.868866, "
##  [4] "POLYGON ((6.131309 49.972565, 6.134291 49.972382, "
##  [5] "POLYGON ((5.977929 50.026016, 5.982312 50.022949, "
##  [6] "POLYGON ((6.385532 49.837029, 6.3886 49.833683, 6."
##  [7] "POLYGON ((6.316665 49.623375, 6.31835 49.623157, 6"
##  [8] "POLYGON ((6.425158 49.731644, 6.42657 49.73082, 6."
##  [9] "POLYGON ((5.998312 49.699924, 5.998632 49.698559, "
## [10] "POLYGON ((6.039474 49.448261, 6.036906 49.448696, "
## [11] "POLYGON ((6.155963 49.685047, 6.159284 49.685036, "
## [12] "POLYGON ((6.067982 49.828465, 6.071922 49.825478, "
```

### Variables¶

You can extract a variable as you would do with a `data.frame`.

```p\$NAME_2
```
```##  [1] "Clervaux"         "Diekirch"         "Redange"          "Vianden"
##  [5] "Wiltz"            "Echternach"       "Remich"           "Grevenmacher"
##  [9] "Capellen"         "Esch-sur-Alzette" "Luxembourg"       "Mersch"
```

To sub-set a SpatVector to one or more variables you can use the notation below. Note how this is different from the above example. Above a vector of values is returned. With the approach below you get a new SpatVector with only one variable.

```p[, "NAME_2"]
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 12, 1  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  source      : lux.shp
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :   NAME_2
##  type        :    <chr>
##  values      : Clervaux
##                Diekirch
##                 Redange
```

You can add a new variable to a `SpatVector` just as if it were a `data.frame`.

```set.seed(0)
p\$lets <- sample(letters, nrow(p))
p
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 12, 7  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  source      : lux.shp
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP  lets
##  type        : <num>    <chr> <num>    <chr> <num> <int> <chr>
##  values      :     1 Diekirch     1 Clervaux   312 18081     n
##                    1 Diekirch     2 Diekirch   218 32543     y
##                    1 Diekirch     3  Redange   259 18664     d
```

Note that to get the number of geometries of SpatVector `p`, you can use `nrow(p)`, or `size(p)`. You can also do `perim(p)` to get the “length” of the spatial objects (zero for points, the length of the lines, or the perimeter of the polygons).

```perim(p)
```
```##  [1] 117100.12  93477.28  84502.45  44919.14  85032.61  74708.05  57991.42
##  [8]  81203.93  74443.82  95564.74  80618.76  70810.67
```

Assigning a new value to an existing variable.

```p\$lets <- sample(LETTERS, nrow(p))
```
```##   ID_1       NAME_1 ID_2     NAME_2 AREA   POP lets
## 1    1     Diekirch    1   Clervaux  312 18081    J
## 2    1     Diekirch    2   Diekirch  218 32543    V
## 3    1     Diekirch    3    Redange  259 18664    N
## 4    1     Diekirch    4    Vianden   76  5163    Z
## 5    1     Diekirch    5      Wiltz  263 16735    G
## 6    2 Grevenmacher    6 Echternach  188 18899    I
```

To get rid of a variable, set it to `NULL`.

```p\$lets <- NULL
```

### Merge¶

You can assign an attributes table (data.frame) to a SpatVector with `values<-`. To add attributes to a SpatVector that already has attributes use `merge` (or `cbind` if you know the order of the records is the same).

```dfr <- data.frame(District=p\$NAME_1, Canton=p\$NAME_2, Value=round(runif(length(p), 100, 1000)))
dfr <- dfr[order(dfr\$Canton), ]
pm <- merge(p, dfr, by.x=c('NAME_1', 'NAME_2'), by.y=c('District', 'Canton'))
pm
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 12, 7  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :   NAME_1   NAME_2  ID_1  ID_2  AREA   POP Value
##  type        :    <chr>    <chr> <num> <num> <num> <int> <num>
##  values      : Diekirch Clervaux     1     1   312 18081   406
##                Diekirch Diekirch     1     2   218 32543   534
##                Diekirch  Redange     1     3   259 18664   640
```
```head(pm)
```
```##         NAME_1     NAME_2 ID_1 ID_2 AREA   POP Value
## 1     Diekirch   Clervaux    1    1  312 18081   406
## 2     Diekirch   Diekirch    1    2  218 32543   534
## 3     Diekirch    Redange    1    3  259 18664   640
## 4     Diekirch    Vianden    1    4   76  5163   544
## 5     Diekirch      Wiltz    1    5  263 16735   268
## 6 Grevenmacher Echternach    2    6  188 18899   845
```

Note the new variable `Value` added to `pm`

### Records¶

Selecting rows (records).

```i <- which(p\$NAME_1 == 'Grevenmacher')
g <- p[i,]
g
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 3, 6  (geometries, attributes)
##  extent      : 6.169137, 6.528252, 49.46498, 49.85403  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  ID_1       NAME_1  ID_2       NAME_2  AREA   POP
##  type        : <num>        <chr> <num>        <chr> <num> <int>
##  values      :     2 Grevenmacher     6   Echternach   188 18899
##                    2 Grevenmacher     7       Remich   129 22366
##                    2 Grevenmacher    12 Grevenmacher   210 29828
```

It is also possible to interactively select and query records by clicking on a plotted dataset. That is difficult to show here. See `?select` for interactively selecting geometries and `?click` to identify attributes by clicking on a plot (map).

## Append¶

More example data. Object `z` consists of four polygons; `z2` is one of these four polygons.

```z <- rast(p)
dim(z) <- c(2,2)
values(z) <- 1:4
names(z) <- 'Zone'
# coerce SpatRaster to SpatVector polygons
z <- as.polygons(z)
z
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 4, 1  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  Zone
##  type        : <int>
##  values      :     1
##                    2
##                    3
```
```z2 <- z[2,]
plot(p)
```

plot of chunk zzz

To append SpatVector objects of the same (vector) type you can use `c`

```b <- rbind(p, z)
# with older versions
# b <- c(p, z)
```
```##   ID_1       NAME_1 ID_2     NAME_2 AREA   POP Zone
## 1    1     Diekirch    1   Clervaux  312 18081   NA
## 2    1     Diekirch    2   Diekirch  218 32543   NA
## 3    1     Diekirch    3    Redange  259 18664   NA
## 4    1     Diekirch    4    Vianden   76  5163   NA
## 5    1     Diekirch    5      Wiltz  263 16735   NA
## 6    2 Grevenmacher    6 Echternach  188 18899   NA
```
```tail(b)
```
```##    ID_1     NAME_1 ID_2     NAME_2 AREA    POP Zone
## 11    3 Luxembourg   10 Luxembourg  237 182607   NA
## 12    3 Luxembourg   11     Mersch  233  32112   NA
## 13  NaN       <NA>  NaN       <NA>  NaN     NA    1
## 14  NaN       <NA>  NaN       <NA>  NaN     NA    2
## 15  NaN       <NA>  NaN       <NA>  NaN     NA    3
## 16  NaN       <NA>  NaN       <NA>  NaN     NA    4
```

Note how `rbind` (`c` for older versions of terra) allows you to append `SpatVect` objects with different attribute names, unlike the standard `rbind` for `data.frame`s.

## Aggregate¶

It is common to aggregate (“dissolve”) polygons that have the same value for an attribute of interest. In this case, if we do not care about the second level subdivisions of Luxembourg, we could aggregate by the first level subdivisions.

```pa <- aggregate(p, by='NAME_1')
za <- aggregate(z)
plot(za, col='light gray', border='light gray', lwd=5)
```

plot of chunk agg

It is also possible to aggregate polygons without dissolving the borders.

```zag <- aggregate(z, dissolve=FALSE)
zag
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 1, 0  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
```
```plot(zag, col="light gray")
```

plot of chunk aggnodis

This is a structure that is similar to what you may get for an archipelago: multiple polygons represented as one entity (one row). Use `disagg` to split these up into their parts.

```zd <- disagg(zag)
zd
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 4, 0  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
```

## Overlay¶

There are many different ways to “overlay” vector data. Here are some examples:

### Erase¶

Erase a part of a SpatVector

```e <- erase(p, z2)
plot(e)
```

plot of chunk raser

### Intersect¶

Intersect SpatVectors

```i <- intersect(p, z2)
plot(i)
```

plot of chunk int

You can also `intersect` or `crop` with a SpatExtent (rectangle). The difference between `intersect` and `crop` is that with `crop` the geometry of the second argument is not added to the output.

```e <- ext(6, 6.4, 49.7, 50)
pe <- crop(p, e)
plot(p)
```

plot of chunk intext

### Union¶

Get the union of two SpatVectors.

```u <- union(p, z)
u
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 28, 7  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP  Zone
##  type        : <num>    <chr> <num>    <chr> <num> <int> <int>
##  values      :     1 Diekirch     1 Clervaux   312 18081     1
##                    1 Diekirch     1 Clervaux   312 18081     2
##                    1 Diekirch     2 Diekirch   218 32543     3
```

Note that there are many more polygons now. One for each unique combination of polygons (and attributes in this case).

```set.seed(5)
plot(u, col=sample(rainbow(length(u))))
```

plot of chunk unionplot

### Cover¶

`cover` is a combination of `intersect` and `union`. `intersect` returns new (intersected) geometries with the attributes of both input datasets. `union` appends the geometries and attributes of the input. `cover` returns the intersection and appends the other geometries and attributes of both datasets.

```cov <- cover(p, z[c(1,4),])
cov
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 11, 7  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP  Zone
##  type        : <num>    <chr> <num>    <chr> <num> <int> <int>
##  values      :     1 Diekirch     1 Clervaux   312 18081    NA
##                    1 Diekirch     2 Diekirch   218 32543    NA
##                    1 Diekirch     3  Redange   259 18664    NA
```
```plot(cov)
```

plot of chunk cov

### Difference¶

The symmetrical difference of two SpatVectors

```dif <- symdif(z,p)
plot(dif, col=rainbow(length(dif)))
```

plot of chunk dif

```dif
```
```##  class       : SpatVector
##  geometry    : polygons
##  dimensions  : 4, 1  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326)
##  names       :  Zone
##  type        : <int>
##  values      :     1
##                    2
##                    3
```

## Spatial queries¶

We can query polygons with points (“point-in-polygon query”).

```pts <- matrix(c(6, 6.1, 5.9, 5.7, 6.4, 50, 49.9, 49.8, 49.7, 49.5), ncol=2)
spts <- vect(pts, crs=crs(p))
plot(z, col='light blue', lwd=2)
points(spts, col='light gray', pch=20, cex=6)
text(spts, 1:nrow(pts), col='red', font=2, cex=1.5)
lines(p, col='blue', lwd=2)
```

plot of chunk pts

`extract` is used for queries between SpatVector and SpatRaster objects, and also for queries between SpatVectors.

```extract(spts, p)
```
```##       id.y id.x
##  [1,]    1  NaN
##  [2,]    2  NaN
##  [3,]    3  NaN
##  [4,]    4  NaN
##  [5,]    5  NaN
##  [6,]    6  NaN
##  [7,]    7  NaN
##  [8,]    8  NaN
##  [9,]    9  NaN
## [10,]   10  NaN
## [11,]   11  NaN
## [12,]   12  NaN
```
```extract(spts, z)
```
```##      id.y id.x
## [1,]    1  NaN
## [2,]    2  NaN
## [3,]    3  NaN
## [4,]    4  NaN
```