"Raster" syntax in GeoStats.jl

Given that most people in the geo community are not aware of the “raster” syntax in GeoStats.jl, I collect a few examples here to facilitate the transition. This post may become part of the official project documentation in the future.

The “raster” model

A “raster” is nothing more than a Julia array that is georeferenced over a geospatial grid. The “raster” model is popular in the scientific community for two main reasons:

  1. it provides convenient syntax to access grid elements within “rectangular” regions,
    including syntax to name array dimensions as “X”, “Y”, … and extract rectangular regions based on coordinate values.

  2. it can rely on DiskArrays.jl and similar packages for loading large datasets lazily, without consuming all the available RAM.

In the following sections I illustrate how GeoTables over Grids share these benefits.

1. Convenient syntax

1.1 n-dimensional array syntax

Any GeoTable over a Grid domain supports the n-dimensional array syntax in the row selector.

Consider the following “raster” from the NaturalEarth dataset:

using GeoStats
using GeoArtifacts

import GLMakie as Mke

raster = NaturalEarth.naturalearth1("water") |> Upscale(10, 5)

raster |> viewer

We can check the size of the underlying grid with:

julia> size(raster.geometry)
(1620, 1620)

This dataset is stored with LatLon coordinates. For convenience, we always store the “horizontal” coordinate along the first axis of the grid (longitude in this case). We can slice the Earth as follows:

raster[(1:800, :), :] |> viewer

raster[(800:1620, :), :] |> viewer

raster[(:, 1:800), :] |> viewer

raster[(:, 500:800), :] |> viewer

We can also slice after 2D projections, which is more common in the literature:

projec = raster |> Proj(Robinson)

projec |> viewer

projec[(300:800, 600:1400), :] |> viewer

Throughout all these examples we used the : column selector to retain all columns of the geotable. We can retain a subset of the columns with usual dataframe syntax:

raster[(1:800, 500:800), ["TEMPERATURE", "PRESSURE"]]

1.2 Slice transform syntax

To select a subset of the dataset from actual coordinate ranges, we can use the Slice transform. To slice the x and y coordinates of the projected dataset with values in kilometers, we do:

using Unitful: km

projec |> Slice(x=(0km, 10000km), y=(0km, 5000km)) |> viewer

Notice that the result is no longer a “raster” because the Robinson projection deforms the graticule, and we can’t simply rely on the sorted “x” and “y” coordinates of the first row and column to find indices across the other rows and columns. The following example makes this clearer:

projec |> Slice(x=(0km, 20000km)) |> viewer

The “first row” of the result has more “columns” than the row of the Equator. This result cannot be represented with the “raster” model. It can, however, be represented with the GeoTable model over a domain view.

The names x and y can be replaced by lon and lat if the crs of the domain is geographic. Below are examples of subgrids over the ellipsoid using lat/lon ranges of known places:

using Unitful: °

# Brazil
raster |> Slice(lat=(-60°, 20°), lon=(-100°, -20°)) |> viewer

# south pole
raster |> Slice(lat=(-90°, -60°)) |> viewer

2. Lazy loading

The Grid itself is lazy, it doesn’t allocate:

julia> Base.summarysize(raster.geometry)
192

Remember that a n-dimensional array is simply a flat memory buffer with an additional tuple containing the number of elements along each dimension. In Julia, the vec and reshape operations are lazy:

julia> x = rand(1000, 1000);

julia> @allocated vec(x)
80

julia> @allocated reshape(vec(x), 1000, 1000)
176

If you have a DiskArray or any other array type that satisfies your memory requirements, simply georef it over a Grid to obtain a “lazy” GeoTable:

raster = georef((var1 = vec(array1), var2 = vec(array2)), grid)

If you load your data with GeoIO.jl, it will take care of these low-level memory details for you.

3 Likes