Thank you @evetion for the great work as usual. Just want to point out that the work that is recently happening around Meshes.jl and GeoStats.jl may potentially lead to a cleaner approach with general Tables.jl tables.
Consider a table
following the Tables.jl
API:
julia> using Tables
julia> table = (a=rand(100), b=rand(100)) # a Tables.jl table
(a = [0.307732211237423, 0.803718835997471, 0.08411713448982439, 0.4830139406337839, 0.89972361406729, 0.4838048624166984, 0.6572940355048909, 0.19799338569165803, 0.7805574255598815, 0.8990875618632652 … 0.07242337288301681, 0.7676977890604322, 0.8262285634679147, 0.03258811191692268, 0.5968713113063921, 0.2685941981073232, 0.4662204960862124, 0.5170359078991946, 0.5012109114845784, 0.11183068325228596], b = [0.9965173138643892, 0.11553726662208041, 0.20159410317223725, 0.7313178905382756, 0.557382816267495, 0.2856813840254393, 0.5016940394671034, 0.7547849171693573, 0.1820419942447178, 0.49498420120152087 … 0.003864021720924482, 0.3285419806520167, 0.26864603835206635, 0.23492724473816962, 0.011024989056165335, 0.8322119339786553, 0.5203882014765839, 0.6062219376724753, 0.8817101294035377, 0.6451100662465474])
We can georeference this table on various kinds of Domain
defined in Meshes.jl, e.g. PointSet, CartesianGrid, GeometrySet, etc. For example, we can georeference on a PointSet:
julia> using GeoStats.jl # v0.23
julia> 𝒟 = georef(table, PointSet(rand(Point3, 100)))
100 GeoData{3,Float64}
variables
└─a (Float64)
└─b (Float64)
domain: 100 PointSet{3,Float64}
This object implements the Meshes.jl Data
trait, and consequently the Tables.jl API:
julia> first(Tables.rows(𝒟), 3)
3-element Vector{NamedTuple{(:a, :b, :geometry), Tuple{Float64, Float64, Point3}}}:
(a = 0.307732211237423, b = 0.9965173138643892, geometry = Point(0.9489465238556838, 0.6840987940920451, 0.09563305979228809))
(a = 0.803718835997471, b = 0.11553726662208041, geometry = Point(0.055798761161550026, 0.13459614756211802, 0.06163990143241693))
(a = 0.08411713448982439, b = 0.20159410317223725, geometry = Point(0.39791182546602677, 0.47022283912414164, 0.6568206304457964))
where an additional column called geometry
has been generated on the fly with the points in the PointSet. The magic starts when we want to georeference data on large meshes. For example, we can create a large CartesianGrid
without allocating any memory:
julia> grid = CartesianGrid(10, 10)
10×10 CartesianGrid{2,Float64}
minimum: Point(0.0, 0.0)
maximum: Point(10.0, 10.0)
spacing: (1.0, 1.0)
and in this case the georerence operation is just a lazy wrapper:
julia> 𝒟 = georef(table, grid)
100 GeoData{2,Float64}
variables
└─a (Float64)
└─b (Float64)
domain: 10×10 CartesianGrid{2,Float64}
However, if we want to treat this data set as table we can:
julia> first(Tables.rows(𝒟), 3)
3-element Vector{NamedTuple{(:a, :b, :geometry), Tuple{Float64, Float64, Quadrangle{2, Float64, Vector{Point2}}}}}:
(a = 0.307732211237423, b = 0.9965173138643892, geometry = Quadrangle(Point(0.0, 0.0), Point(1.0, 0.0), Point(1.0, 1.0), Point(0.0, 1.0)))
(a = 0.803718835997471, b = 0.11553726662208041, geometry = Quadrangle(Point(1.0, 0.0), Point(2.0, 0.0), Point(2.0, 1.0), Point(1.0, 1.0)))
(a = 0.08411713448982439, b = 0.20159410317223725, geometry = Quadrangle(Point(2.0, 0.0), Point(3.0, 0.0), Point(3.0, 1.0), Point(2.0, 1.0)))
Notice how the geometry column now has Quadrangle
from Meshes.jl built on the fly. This is a much needed feature for large-scale geostatistics on unstructured meshes.
So moving forward, if you plan to interop with the GeoStats.jl stack for statistical work, I suggest to make your GeoDataFrame
type a concrete implementation of the Meshes.jl Data
trait. This is all that is needed to take advantage of the full stack of geospatial estimation, simulation, learning methods we are building. Additionally, if you implement this trait, you will get plot recipes for free, which will become Makie.jl recipes at some point.