OVERVIEW
GeoStats.jl v0.36 is out with major usability and performance improvements. We are one step closer to a unique experience for geodata science and geostatistical learning in pure Julia.
TL;DR this release provides advanced geospatial pipelines and split-apply-combine, efficient DataFrame
-like access to geospatial data, and major performance improvements in interactive visualization with Makie.
RELEASE NOTES
FEATURES
- New macros
@groupby
,@transform
and@combine
for geospatial split-apply-combine - Full integration with table transforms from TableTransforms.jl and new geospatial pipelines
- New
DataFrame
-like interface for geospatial data (e.g. df[rows,cols]) - Automatic selection of colorschemes from scientific types of variables
- New
GridTopology
with new adjacency topological relations - New
RectilinearGrid
for constructing grids from x, y, z, … coordinates - New
LaplaceSmoothing
geometric transform as a companion toTaubinSmoothing
- New
ProjectionPursuit
multivariate feature transform - New geostatistical spectral clustering
GSC
algorithm - New options for
DouglasPeucker
simplification of geometries - New intersection algorithms between
Ray
,Segment
,Line
,Triangle
, … - New
perimeter
function and implementations ofmeasure
,area
,volume
, … - New constructors with iterators for
PointSet
,GeometrySet
,Collection
,Multi
, …
IMPROVEMENTS
- Major visualization speedups with Makie.jl with Dehn’s 1899 triangulation from Meshes.jl
- Refactored geospatial partition methods and fixed integration with MLJ.jl clustering
- Refactored show methods of various types to emphasize geospatial domains
- Revived GPU support for
IQ
geostatistical simulation solver - Updated documentation with new recommended methods for geospatial data
BREAKING
- Drop support for Query.jl in favor of TableTransforms.jl
- Drop indexable interface from
Multi
in favor ofcollect
- Drop
embeddim
andcoordtype
support for geospatial data - Drop
ClusteringTask
in favor ofcluster
interface function - Rename validation methods (e.g.
CrossValidation
→KFoldValidation
)
DEMONSTRATION
Some of the new features are demonstrated in a recent video:
Below we demonstrate the features with simpler data sets.
First we load the necessary packages:
julia> using GeoStats # all-batteries included!
julia> using GeoStatsViz # except for visualization
julia> import GLMakie as Mke # choose Makie backend
Geospatial split-apply-combine
Create geospatial data over 2D grid with features a
, b
, c
:
julia> data = georef((a = rand(10, 10), b = rand(10, 10), c=rand(1:4, 10, 10)))
10×10 CartesianGrid{2,Float64}
variables (rank 2)
└─a (Float64)
└─b (Float64)
└─c (Int64)
We can easily group by c
to produce a geospatial partition:
julia> groups = @groupby(data, :c)
4 Partition
└─22 View{100 MeshData}
└─29 View{100 MeshData}
└─33 View{100 MeshData}
└─16 View{100 MeshData}
metadata: rows, names
We can then transform the groups with transforms that involve both the features and the special geometry
column:
julia> transf = @transform(groups, :d = 2*:a + area(:geometry))
4 Partition
└─22 View{100 MeshData}
└─29 View{100 MeshData}
└─33 View{100 MeshData}
└─16 View{100 MeshData}
metadata: rows, names
Finally, we can combine the results of the transformed groups:
julia> result = @combine(transf, :e = mean(:d))
4 GeometrySet{2,Float64}
variables (rank 2)
└─c (Float64)
└─e (Float64)
Notice that, unlike in the video above, we now changed the behavior to return Multi
geometries for the groups instead of the centroids:
julia> result.geometry
4 GeometrySet{2,Float64}
└─22 MultiNgon{2,Float64}
└─29 MultiNgon{2,Float64}
└─33 MultiNgon{2,Float64}
└─16 MultiNgon{2,Float64}
julia> viz(result.geometry, color = 1:4)
Also notice how the geometry
column has a special Domain
type, which is lazy in many cases avoiding the construction of expensive geometries in large datasets:
julia> data.geometry
10×10 CartesianGrid{2,Float64}
minimum: Point(0.0, 0.0)
maximum: Point(10.0, 10.0)
spacing: (1.0, 1.0)
Geospatial transform pipelines
We can combine feature transforms from TableTransforms.jl with geometric transforms from Meshes.jl to create truly geospatial pipelines.
For example, we can create a pipeline to standardize the coordinates of the domain, then compute the z-score of the features a
and b
:
julia> pipe = StdCoords() → Select(:a, :b) → ZScore()
SequentialTransform
├─ StdCoords()
├─ Select([:a, :b], nothing)
└─ ZScore(all)
Applying the pipeline to a CartesianGrid
domain leads to a SimpleMesh
domain, but the underlying topology is still a GridTopology
:
julia> score = data |> pipe
100 SimpleMesh{2,Float64}
variables (rank 2)
└─a (Float64)
└─b (Float64)
julia> topology(domain(data))
10×10 GridTopology(aperiodic, aperiodic)
julia> topology(domain(score))
10×10 GridTopology(aperiodic, aperiodic)
We can visualize both data sets (notice the second in the bottom left corner):
julia> viz(data, variable = :a)
julia> viz!(score, variable = :a, colorscheme = :coolwarm)
DataFrame-like interface
We now support efficient access to “rows” and “columns” of geospatial data over arbitrary domains:
julia> data[1:3,:]
3 View{10×10 CartesianGrid{2,Float64}}
variables (rank 2)
└─a (Float64)
└─b (Float64)
└─c (Int64)
julia> data[1,:]
(a = 0.4986869004101653, b = 0.9076027666401961, c = 4, geometry = Quadrangle(Point(0.0, 0.0), Point(1.0, 0.0), Point(1.0, 1.0), Point(0.0, 1.0)))
julia> data[:,:a]
100-element Vector{Float64}:
0.4986869004101653
0.04423514219396685
0.7431805912560505
0.5488350636776107
0.1781593152690658
0.34287338202772666
0.01556466336416984
0.08127752655318943
0.2594584424431915
0.7372470841115343
0.5219963894546638
0.2632569123088946
0.5402482890222606
0.7302507649502713
0.30546916759280573
0.8386743894750512
0.5462308898202195
0.5822888075805445
⋮
0.43027481017912594
0.38445713289998273
0.9287552353217178
0.7189205200160556
0.4372832350782744
0.8527347769479305
0.31562905739426095
0.6731884280856284
0.5870119105459504
0.5237663202982149
0.07350755294901445
0.7963836858702719
0.650505284402908
0.3696365329236847
0.9032714797251225
0.48907464522848165
0.7947529092094066
0.280343959354032
Automatic selection of color schemes
If you are explicit about your scientific types, the visualization pipeline will suggest color schemes that are friendly to color-blind readers:
julia> scidata = data |> Coerce(:c => Multiclass)
10×10 CartesianGrid{2,Float64}
variables (rank 2)
└─a (Float64)
└─b (Float64)
└─c (CategoricalArrays.CategoricalValue{Int64, UInt32})
julia> viz(scidata, variable = :c)
And of course, all these features work with general unstructured meshes, point sets, geometry sets, and all other domain types provided by Meshes.jl.
Acknowledgements
Thanks to all contributors (cc: Elias Carvalho (@eliascarv), Gerhard Dorn (@dorn-gerhard), Claro Henrique (@ClaroHenrique), Diana Aldana). We are finally reaching the user experience I dreamed years ago when GeoStats.jl was just a tiny research package.
Would like to support us? Leave a on GitHub if you didn’t already: GitHub - JuliaEarth/GeoStats.jl: An extensible framework for geospatial data science and geostatistical modeling fully written in Julia
Follow our company page on LinkedIn and Twitter for updates and job opportunities: