Given that most people in the geo
community are not aware of the “raster” syntax in GeoStats.jl, I collect a few examples here to facilitate the transition. This post may become part of the official project documentation in the future.
The “raster” model
A “raster” is nothing more than a Julia array that is georeferenced over a geospatial grid. The “raster” model is popular in the scientific community for two main reasons:
-
it provides convenient syntax to access grid elements within “rectangular” regions,
including syntax to name array dimensions as “X”, “Y”, … and extract rectangular regions based on coordinate values. -
it can rely on DiskArrays.jl and similar packages for loading large datasets lazily, without consuming all the available RAM.
In the following sections I illustrate how GeoTable
s over Grid
s share these benefits.
1. Convenient syntax
1.1 n-dimensional array syntax
Any GeoTable
over a Grid
domain supports the n-dimensional array syntax in the row selector.
Consider the following “raster” from the NaturalEarth dataset:
using GeoStats
using GeoArtifacts
import GLMakie as Mke
raster = NaturalEarth.naturalearth1("water") |> Upscale(10, 5)
raster |> viewer
We can check the size of the underlying grid with:
julia> size(raster.geometry)
(1620, 1620)
This dataset is stored with LatLon
coordinates. For convenience, we always store the “horizontal” coordinate along the first axis of the grid (longitude in this case). We can slice the Earth as follows:
raster[(1:800, :), :] |> viewer
raster[(800:1620, :), :] |> viewer
raster[(:, 1:800), :] |> viewer
raster[(:, 500:800), :] |> viewer
We can also slice after 2D projections, which is more common in the literature:
projec = raster |> Proj(Robinson)
projec |> viewer
projec[(300:800, 600:1400), :] |> viewer
Throughout all these examples we used the :
column selector to retain all columns of the geotable. We can retain a subset of the columns with usual dataframe syntax:
raster[(1:800, 500:800), ["TEMPERATURE", "PRESSURE"]]
1.2 Slice transform syntax
To select a subset of the dataset from actual coordinate ranges, we can use the Slice
transform. To slice the x
and y
coordinates of the projected dataset with values in kilometers, we do:
using Unitful: km
projec |> Slice(x=(0km, 10000km), y=(0km, 5000km)) |> viewer
Notice that the result is no longer a “raster” because the Robinson
projection deforms the graticule, and we can’t simply rely on the sorted “x” and “y” coordinates of the first row and column to find indices across the other rows and columns. The following example makes this clearer:
projec |> Slice(x=(0km, 20000km)) |> viewer
The “first row” of the result has more “columns” than the row of the Equator. This result cannot be represented with the “raster” model. It can, however, be represented with the GeoTable
model over a domain view.
The names x
and y
can be replaced by lon
and lat
if the crs
of the domain is geographic. Below are examples of subgrids over the ellipsoid using lat/lon ranges of known places:
using Unitful: °
# Brazil
raster |> Slice(lat=(-60°, 20°), lon=(-100°, -20°)) |> viewer
# south pole
raster |> Slice(lat=(-90°, -60°)) |> viewer
2. Lazy loading
The Grid
itself is lazy, it doesn’t allocate:
julia> Base.summarysize(raster.geometry)
192
Remember that a n-dimensional array is simply a flat memory buffer with an additional tuple containing the number of elements along each dimension. In Julia, the vec
and reshape
operations are lazy:
julia> x = rand(1000, 1000);
julia> @allocated vec(x)
80
julia> @allocated reshape(vec(x), 1000, 1000)
176
If you have a DiskArray or any other array type that satisfies your memory requirements, simply georef
it over a Grid
to obtain a “lazy” GeoTable
:
raster = georef((var1 = vec(array1), var2 = vec(array2)), grid)
If you load your data with GeoIO.jl, it will take care of these low-level memory details for you.