Appendix II. The “rasterfile” format

Introduction

The raster package has a default ‘native’ file format called ‘rasterfile’. This file format is used because it is simple, flexible and extensible and does not require rgdal, which may be difficult to install. The raster package can read and write other formats via the rgdal package.

The rasterfile format is highly similar to many other formats used for raster data. It consists of two files. One file with sequential binary values (filename extention is ‘.gri’), and one header file (filename extension is ‘.grd’). The main source of variation between such file formats is in the header file, and the contents of the rasterfile header file are described here.

The purpose is to standardize the format and help others to read and write files of the same format if they wish to do so. This vignette is aimed at software developers. The typical user of the raster package does not need to be familiar with it.

ini files

The header (‘.grd’) file is organized as an ‘.ini’ file. This is a simple database format that is subdivided in sections indicated with brackets ‘[]’. Within each section there are variables and their values seperated by the equal ‘=’ sign.

Thus, .ini files have a layout like this:

[section1]

var1=value

var2=value

[section2]

var3=value

var4=value

Variables names must be unique within a section, but the same variable name could occur in multiple sections. This is not done for raster files (variable names are unique) such that section names could be ignored. The raster package has a convenient function, readIniFile, to read .ini files.

Sections

The rasterfile ini format has four sections (general, georeference, data, legend, description) that are discussed below

general

This section has two variables, ‘creator’ and ‘created’. For example:

[general]

creator=R package 'raster'

created= 2010-03-13 17:26:34

These are metadata that are useful but not strictly required.

georeference

This section has the number of rows (nrows) and columns (ncols), and describes the spatial extent (bounding box) with four variables (xmin, xmax, ymin, ymax), and the coordinate reference system (projection). These variables are obviously required.

The number of rows and columns are integers >= 1. The extent variables are numeric with xmin < xmax and ymin < ymax. The coordinates refer to the extremes of the outer cells (not to the centers of these cells).

Resolution (cell size) is not specified (it should be derived value from the extent and the number of columns and rows).

The coordinate reference system is specified with the variable ‘projection’. Its value should be a string with the PROJ4 syntax. This value can be missing, but that is not recommended!

[georeference]

nrows=100

ncols=100

xmin=-180

ymin=-90

xmax=180

ymax=90

projection=+proj=longlat +datum=WGS84

data

This subsection has information about the file type as well as the cell values. Here is an example

[data]

datatype=FLT4S

nodatavalue=-3.4e+38

byteorder=little

nbands=3

bandorder=BIL

minvalue=1:0:5

maxvalue=255:200:255

datatype is required. Its values must be one of ‘LOG1S’, ‘INT1S’, ‘INT2S’, ‘INT4S’, ‘INT8S’, ‘INT1U’, ‘INT2U’, ‘FLT4S’, ‘FLT8S’. The first three letters indicate the type of number that is stored (logical, integer, or float). The fourth character determines how many bytes are used to store these. The last letter inidcates, if applicable, whether the values are singed or not (i.e. whether negative values are possible).

nodatavalue is optional (but necessary if there are nodata (NA) values). It can be any value. But in the raster package the lowest possible value is used for signed integer and float data types and the highest integer is used for unsigned integer types (this is to avoid using 0 as the nodata value).

byteorder is optional but recommended. It should be either ‘big’ or ‘little’. If absent, the raster package assumes that the platform byte order is used.

nbands is required. It indicates the number of layers (bands) stored in the file and hence its values should be an integer >= 1. If absent, the raster package assumes it is 1.

bandorder is required if nbands > 1 and ignored when nbands=1. Values can be ‘BIL’ (band interleaved by line), ‘BIP’ (band interleaved by pixel) and ‘BSQ’ (band sequential). BIL is recommended for most cases.

minvalue and maxvalue indicate the minimum or maximum value in the each layer (excluding NA). If there are mulitple layers, the value are seperated by a colon.

If the values are integers representing a class (e.g. land cover types such as ‘forest’, ‘urban’, ‘agriculture’) four additional keys are required to indicate that these are categorical data and to provide three columns for a ‘Raster Attribute Table’. In this case there are three variables (ID, landocver and code). ID refers to the actual cell value, the following are attributed linked to these values. rattypes describe the data type. ID and would normally be ‘integer’. Other values allowed are ‘character’ and ‘numerical’. ‘ratvalues’ gives the actual values. For example:

categorical=TRUE

ratnames=ID:landcover:code

rattypes=integer:character:numeric

ratvalues=1:2:3:Pine:Oak:Meadow:12:25:30

description

This section only has the layer names. As above, these are separated by colons. Therefore, colons are not allowed in the layer names. If they occur, they could be replaced with at dot ‘.’.

[description]

layername=red:green:blue