OVERVIEW
GeoStats.jl v0.36 is out with major usability and performance improvements. We are one step closer to a unique experience for geodata science and geostatistical learning in pure Julia.
TL;DR this release provides advanced geospatial pipelines and split-apply-combine, efficient DataFrame-like access to geospatial data, and major performance improvements in interactive visualization with Makie.
RELEASE NOTES
FEATURES
- New macros
@groupby,@transformand@combinefor geospatial split-apply-combine - Full integration with table transforms from TableTransforms.jl and new geospatial pipelines
- New
DataFrame-like interface for geospatial data (e.g. df[rows,cols]) - Automatic selection of colorschemes from scientific types of variables
- New
GridTopologywith new adjacency topological relations - New
RectilinearGridfor constructing grids from x, y, z, … coordinates - New
LaplaceSmoothinggeometric transform as a companion toTaubinSmoothing - New
ProjectionPursuitmultivariate feature transform - New geostatistical spectral clustering
GSCalgorithm - New options for
DouglasPeuckersimplification of geometries - New intersection algorithms between
Ray,Segment,Line,Triangle, … - New
perimeterfunction and implementations ofmeasure,area,volume, … - New constructors with iterators for
PointSet,GeometrySet,Collection,Multi, …
IMPROVEMENTS
- Major visualization speedups with Makie.jl with Dehn’s 1899 triangulation from Meshes.jl
- Refactored geospatial partition methods and fixed integration with MLJ.jl clustering
- Refactored show methods of various types to emphasize geospatial domains
- Revived GPU support for
IQgeostatistical simulation solver - Updated documentation with new recommended methods for geospatial data
BREAKING
- Drop support for Query.jl in favor of TableTransforms.jl
- Drop indexable interface from
Multiin favor ofcollect - Drop
embeddimandcoordtypesupport for geospatial data - Drop
ClusteringTaskin favor ofclusterinterface function - Rename validation methods (e.g.
CrossValidation→KFoldValidation)
DEMONSTRATION
Some of the new features are demonstrated in a recent video:
Below we demonstrate the features with simpler data sets.
First we load the necessary packages:
julia> using GeoStats # all-batteries included!
julia> using GeoStatsViz # except for visualization
julia> import GLMakie as Mke # choose Makie backend
Geospatial split-apply-combine
Create geospatial data over 2D grid with features a, b, c:
julia> data = georef((a = rand(10, 10), b = rand(10, 10), c=rand(1:4, 10, 10)))
10×10 CartesianGrid{2,Float64}
variables (rank 2)
└─a (Float64)
└─b (Float64)
└─c (Int64)
We can easily group by c to produce a geospatial partition:
julia> groups = @groupby(data, :c)
4 Partition
└─22 View{100 MeshData}
└─29 View{100 MeshData}
└─33 View{100 MeshData}
└─16 View{100 MeshData}
metadata: rows, names
We can then transform the groups with transforms that involve both the features and the special geometry column:
julia> transf = @transform(groups, :d = 2*:a + area(:geometry))
4 Partition
└─22 View{100 MeshData}
└─29 View{100 MeshData}
└─33 View{100 MeshData}
└─16 View{100 MeshData}
metadata: rows, names
Finally, we can combine the results of the transformed groups:
julia> result = @combine(transf, :e = mean(:d))
4 GeometrySet{2,Float64}
variables (rank 2)
└─c (Float64)
└─e (Float64)
Notice that, unlike in the video above, we now changed the behavior to return Multi geometries for the groups instead of the centroids:
julia> result.geometry
4 GeometrySet{2,Float64}
└─22 MultiNgon{2,Float64}
└─29 MultiNgon{2,Float64}
└─33 MultiNgon{2,Float64}
└─16 MultiNgon{2,Float64}
julia> viz(result.geometry, color = 1:4)
Also notice how the geometry column has a special Domain type, which is lazy in many cases avoiding the construction of expensive geometries in large datasets:
julia> data.geometry
10×10 CartesianGrid{2,Float64}
minimum: Point(0.0, 0.0)
maximum: Point(10.0, 10.0)
spacing: (1.0, 1.0)
Geospatial transform pipelines
We can combine feature transforms from TableTransforms.jl with geometric transforms from Meshes.jl to create truly geospatial pipelines.
For example, we can create a pipeline to standardize the coordinates of the domain, then compute the z-score of the features a and b:
julia> pipe = StdCoords() → Select(:a, :b) → ZScore()
SequentialTransform
├─ StdCoords()
├─ Select([:a, :b], nothing)
└─ ZScore(all)
Applying the pipeline to a CartesianGrid domain leads to a SimpleMesh domain, but the underlying topology is still a GridTopology:
julia> score = data |> pipe
100 SimpleMesh{2,Float64}
variables (rank 2)
└─a (Float64)
└─b (Float64)
julia> topology(domain(data))
10×10 GridTopology(aperiodic, aperiodic)
julia> topology(domain(score))
10×10 GridTopology(aperiodic, aperiodic)
We can visualize both data sets (notice the second in the bottom left corner):
julia> viz(data, variable = :a)
julia> viz!(score, variable = :a, colorscheme = :coolwarm)
DataFrame-like interface
We now support efficient access to “rows” and “columns” of geospatial data over arbitrary domains:
julia> data[1:3,:]
3 View{10×10 CartesianGrid{2,Float64}}
variables (rank 2)
└─a (Float64)
└─b (Float64)
└─c (Int64)
julia> data[1,:]
(a = 0.4986869004101653, b = 0.9076027666401961, c = 4, geometry = Quadrangle(Point(0.0, 0.0), Point(1.0, 0.0), Point(1.0, 1.0), Point(0.0, 1.0)))
julia> data[:,:a]
100-element Vector{Float64}:
0.4986869004101653
0.04423514219396685
0.7431805912560505
0.5488350636776107
0.1781593152690658
0.34287338202772666
0.01556466336416984
0.08127752655318943
0.2594584424431915
0.7372470841115343
0.5219963894546638
0.2632569123088946
0.5402482890222606
0.7302507649502713
0.30546916759280573
0.8386743894750512
0.5462308898202195
0.5822888075805445
⋮
0.43027481017912594
0.38445713289998273
0.9287552353217178
0.7189205200160556
0.4372832350782744
0.8527347769479305
0.31562905739426095
0.6731884280856284
0.5870119105459504
0.5237663202982149
0.07350755294901445
0.7963836858702719
0.650505284402908
0.3696365329236847
0.9032714797251225
0.48907464522848165
0.7947529092094066
0.280343959354032
Automatic selection of color schemes
If you are explicit about your scientific types, the visualization pipeline will suggest color schemes that are friendly to color-blind readers:
julia> scidata = data |> Coerce(:c => Multiclass)
10×10 CartesianGrid{2,Float64}
variables (rank 2)
└─a (Float64)
└─b (Float64)
└─c (CategoricalArrays.CategoricalValue{Int64, UInt32})
julia> viz(scidata, variable = :c)
And of course, all these features work with general unstructured meshes, point sets, geometry sets, and all other domain types provided by Meshes.jl.
Acknowledgements
Thanks to all contributors (cc: Elias Carvalho (@eliascarv), Gerhard Dorn (@dorn-gerhard), Claro Henrique (@ClaroHenrique), Diana Aldana). We are finally reaching the user experience I dreamed years ago when GeoStats.jl was just a tiny research package.
Would like to support us? Leave a
on GitHub if you didn’t already: GitHub - JuliaEarth/GeoStats.jl: An extensible framework for geospatial data science and geostatistical modeling fully written in Julia
Follow our company page on LinkedIn and Twitter for updates and job opportunities:


