TL;DR
GeoStats.jl v0.24 comes with full support to general unstructured meshes via the new Meshes.jl project, and provides powerful geospatial queries built with the Query.jl language.
Geostatistics on general meshes
For those coming from other fields, geostatistics is the branch of statistics that deals with data distributed over geospatial domains. A challenge in this field is to operate efficiently with large 3D meshes, which are discretizations of these domains into simple geometric objects such as triangles, tetrahedrons, etc.
Over the last months, Iβve been working actively on efficient representations of geostatistical problems such as geostatistical estimation (e.g. interpolation), simulation, and more recently, geostatistical learning over meshes, and this research has led to a very clean API that is tightly integrated with the Julia data science and machine learning ecosystems.
To illustrate these features, letβs consider any table implementing the Tables.jl interface:
julia> using Tables
julia> table = (a=rand(100), b=rand(100)) # a Tables.jl table
(a = [0.307732211237423, 0.803718835997471, 0.08411713448982439, 0.4830139406337839, 0.89972361406729, 0.4838048624166984, 0.6572940355048909, 0.19799338569165803, 0.7805574255598815, 0.8990875618632652 β¦ 0.07242337288301681, 0.7676977890604322, 0.8262285634679147, 0.03258811191692268, 0.5968713113063921, 0.2685941981073232, 0.4662204960862124, 0.5170359078991946, 0.5012109114845784, 0.11183068325228596], b = [0.9965173138643892, 0.11553726662208041, 0.20159410317223725, 0.7313178905382756, 0.557382816267495, 0.2856813840254393, 0.5016940394671034, 0.7547849171693573, 0.1820419942447178, 0.49498420120152087 β¦ 0.003864021720924482, 0.3285419806520167, 0.26864603835206635, 0.23492724473816962, 0.011024989056165335, 0.8322119339786553, 0.5203882014765839, 0.6062219376724753, 0.8817101294035377, 0.6451100662465474])
We can easily georeference this table on various kinds of geospatial Domain
defined in Meshes.jl, e.g. PointSet
, CartesianGrid
, GeometrySet
, etc.
For example, we can georeference the table on a PointSet
:
julia> using GeoStats # v0.24
julia> π = georef(table, PointSet(rand(Point3, 100)))
100 GeoData{3,Float64}
variables
ββa (Float64)
ββb (Float64)
domain: 100 PointSet{3,Float64}
This object implements the Meshes.jl Data
trait, and consequently it can also be seen as a table:
julia> first(Tables.rows(π), 3)
3-element Vector{NamedTuple{(:a, :b, :geometry), Tuple{Float64, Float64, Point3}}}:
(a = 0.307732211237423, b = 0.9965173138643892, geometry = Point(0.9489465238556838, 0.6840987940920451, 0.09563305979228809))
(a = 0.803718835997471, b = 0.11553726662208041, geometry = Point(0.055798761161550026, 0.13459614756211802, 0.06163990143241693))
(a = 0.08411713448982439, b = 0.20159410317223725, geometry = Point(0.39791182546602677, 0.47022283912414164, 0.6568206304457964))
where an additional column called geometry
has been generated on the fly with the points in the PointSet
.
The main advantage of this design becomes clear when we want to georeference data on large meshes. For example, we can create a large CartesianGrid
without allocating memory:
julia> @allocated CartesianGrid(100,100,100)
0
julia> dom = CartesianGrid(100,100,100)
100Γ100Γ100 CartesianGrid{3,Float64}
minimum: Point(0.0, 0.0, 0.0)
maximum: Point(100.0, 100.0, 100.0)
spacing: (1.0, 1.0, 1.0)
and (lazily) load a large table from a database to be georeferenced on this grid:
julia> tab = (a=rand(1000000), b=rand(1000000))
(a = [0.17820079093053254, 0.7944333803391856, 0.8376111603886238, 0.8330377164106701, 0.3823010119717243, 0.8544562649114289, 0.934674557908652, 0.0993039666891864, 0.040738538738726904, 0.341148837826444 β¦ 0.21904461787804985, 0.20420960818944778, 0.5521403980538013, 0.317198105419201, 0.6136359539215912, 0.9147469824853054, 0.7982188591455464, 0.7914957990358054, 0.7286901016022427, 0.43962802801927614], b = [0.5392095325730726, 0.257817791509767, 0.4145000606294422, 0.9371621989111043, 0.3336561075167659, 0.043759436728378454, 0.12460964715500866, 0.6513118917698182, 0.5309687456864345, 0.09521258333360771 β¦ 0.16789605649058847, 0.3147624093439827, 0.055768891575411095, 0.6452352355034459, 0.3038137169208477, 0.005174220791681394, 0.7088761618564241, 0.6580499412221101, 0.6846371637946289, 0.7824615967594442])
julia> π = georef(tab, dom)
1000000 GeoData{3,Float64}
variables
ββa (Float64)
ββb (Float64)
domain: 100Γ100Γ100 CartesianGrid{3,Float64}
The georef
operation will just create a wrapper with the table and the mesh. Similar to what we did in the PointSet
case, we can treat the result of this operation as a table:
julia> first(Tables.rows(π), 3)
3-element Vector{Any}:
(a = 0.17820079093053254, b = 0.5392095325730726, geometry = Hexahedron(Point(0.0, 0.0, 0.0), Point(1.0, 0.0, 0.0), Point(1.0, 1.0, 0.0), Point(0.0, 1.0, 0.0), Point(0.0, 0.0, 1.0), Point(1.0, 0.0, 1.0), Point(1.0, 1.0, 1.0), Point(0.0, 1.0, 1.0)))
(a = 0.7944333803391856, b = 0.257817791509767, geometry = Hexahedron(Point(1.0, 0.0, 0.0), Point(2.0, 0.0, 0.0), Point(2.0, 1.0, 0.0), Point(1.0, 1.0, 0.0), Point(1.0, 0.0, 1.0), Point(2.0, 0.0, 1.0), Point(2.0, 1.0, 1.0), Point(1.0, 1.0, 1.0)))
(a = 0.8376111603886238, b = 0.4145000606294422, geometry = Hexahedron(Point(2.0, 0.0, 0.0), Point(3.0, 0.0, 0.0), Point(3.0, 1.0, 0.0), Point(2.0, 1.0, 0.0), Point(2.0, 0.0, 1.0), Point(3.0, 0.0, 1.0), Point(3.0, 1.0, 1.0), Point(2.0, 1.0, 1.0)))
However, this time we notice Hexahedron
geometries built on the fly. This lazy construction of geometries is an important feature for large-scale (e.g. global-scale) geostatistics, which can easily involve more than hundreds of millions of geometries in 3D.
The result of georef
can be used to define geostatistical problems such as geostatistical learning, which can be solved with any classical (non-geospatial) learning model from MLJ.jl. For a concrete example, see our https://juliaearth.github.io/GeoStats.jl/stable/workflow.html.
Geospatial data science
Wouldnβt it be great if we could more easily leverage our Julia data science stack on geospatial data? In particular, wouldnβt it be great if we could perform geospatial queries using a powerful query language?
In this release, we provide integration with the amazing Queryverse, meaning that the result of georef
is query-able and that queries can involve geometric operations such as converting the Hexahedron
s of the table into their centroids to produce a new geospatial table:
using Query
julia> π |> @mutate(geometry = centroid(_.geometry))
1000000x3 query result
a β b β geometry
βββββββββββΌββββββββββββΌβββββββββββββββββββββ
0.178201 β 0.53921 β Point(0.5, 0.5, 0.5)
0.794433 β 0.257818 β Point(1.5, 0.5, 0.5)
0.837611 β 0.4145 β Point(2.5, 0.5, 0.5)
0.833038 β 0.937162 β Point(3.5, 0.5, 0.5)
0.382301 β 0.333656 β Point(4.5, 0.5, 0.5)
0.854456 β 0.0437594 β Point(5.5, 0.5, 0.5)
0.934675 β 0.12461 β Point(6.5, 0.5, 0.5)
0.099304 β 0.651312 β Point(7.5, 0.5, 0.5)
0.0407385 β 0.530969 β Point(8.5, 0.5, 0.5)
0.341149 β 0.0952126 β Point(9.5, 0.5, 0.5)
... with 999990 more rows
We can pipe as many queries as needed to clean our data for a geostatistical problem. When we are done with the queries, we can pipe the result back to a GeoData
object, which is the wrapper type returned by georef
:
julia> π |> @mutate(geometry = centroid(_.geometry)) |> GeoData
1000000 GeoData{3,Float64}
variables
ββa (Float64)
ββb (Float64)
domain: 1000000 PointSet{3,Float64}
The geospatial domain will be automatically updated. As can be seen in this example, the pipeline converted a CartesianGrid
of Hexahedron
to a PointSet
of centroids, everything in pure Julia. The associated values for a
and b
could have been adjusted accordingly in a real application.
Contributing
We invite users to try the new release and provide feedback. We have a GSoC proposal for advanced Julia programmers (no exceptions) that have experience with machine learning and geospatial data: https://julialang.org/jsoc/gsoc/GeoStats
Acknowlegements
I would like to thank @quinnj and @davidanthoff for their help with tabular interfaces, and the contributors to the Meshes.jl project for various fruitful discussions, including @serenity4 @stillyslalom @hameye and others.