[ANN] GeoStats.jl v0.36

OVERVIEW

GeoStats.jl v0.36 is out with major usability and performance improvements. We are one step closer to a unique experience for geodata science and geostatistical learning in pure Julia.

TL;DR this release provides advanced geospatial pipelines and split-apply-combine, efficient DataFrame-like access to geospatial data, and major performance improvements in interactive visualization with Makie.

RELEASE NOTES

FEATURES

  • New macros @groupby, @transform and @combine for geospatial split-apply-combine
  • Full integration with table transforms from TableTransforms.jl and new geospatial pipelines
  • New DataFrame-like interface for geospatial data (e.g. df[rows,cols])
  • Automatic selection of colorschemes from scientific types of variables
  • New GridTopology with new adjacency topological relations
  • New RectilinearGrid for constructing grids from x, y, z, … coordinates
  • New LaplaceSmoothing geometric transform as a companion to TaubinSmoothing
  • New ProjectionPursuit multivariate feature transform
  • New geostatistical spectral clustering GSC algorithm
  • New options for DouglasPeucker simplification of geometries
  • New intersection algorithms between Ray, Segment, Line, Triangle, …
  • New perimeter function and implementations of measure, area, volume, …
  • New constructors with iterators for PointSet, GeometrySet, Collection, Multi, …

IMPROVEMENTS

  • Major visualization speedups with Makie.jl with Dehn’s 1899 triangulation from Meshes.jl
  • Refactored geospatial partition methods and fixed integration with MLJ.jl clustering
  • Refactored show methods of various types to emphasize geospatial domains
  • Revived GPU support for IQ geostatistical simulation solver
  • Updated documentation with new recommended methods for geospatial data

BREAKING

  • Drop support for Query.jl in favor of TableTransforms.jl
  • Drop indexable interface from Multi in favor of collect
  • Drop embeddim and coordtype support for geospatial data
  • Drop ClusteringTask in favor of cluster interface function
  • Rename validation methods (e.g. CrossValidationKFoldValidation)

DEMONSTRATION

Some of the new features are demonstrated in a recent video:

Below we demonstrate the features with simpler data sets.

First we load the necessary packages:

julia> using GeoStats # all-batteries included!

julia> using GeoStatsViz # except for visualization

julia> import GLMakie as Mke # choose Makie backend

Geospatial split-apply-combine

Create geospatial data over 2D grid with features a, b, c:

julia> data = georef((a = rand(10, 10), b = rand(10, 10), c=rand(1:4, 10, 10)))
10×10 CartesianGrid{2,Float64}
  variables (rank 2)
    └─a (Float64)
    └─b (Float64)
    └─c (Int64)

We can easily group by c to produce a geospatial partition:

julia> groups = @groupby(data, :c)
4 Partition
  └─22 View{100 MeshData}
  └─29 View{100 MeshData}
  └─33 View{100 MeshData}
  └─16 View{100 MeshData}
  metadata: rows, names

We can then transform the groups with transforms that involve both the features and the special geometry column:

julia> transf = @transform(groups, :d = 2*:a + area(:geometry))
4 Partition
  └─22 View{100 MeshData}
  └─29 View{100 MeshData}
  └─33 View{100 MeshData}
  └─16 View{100 MeshData}
  metadata: rows, names

Finally, we can combine the results of the transformed groups:

julia> result = @combine(transf, :e = mean(:d))
4 GeometrySet{2,Float64}
  variables (rank 2)
    └─c (Float64)
    └─e (Float64)

Notice that, unlike in the video above, we now changed the behavior to return Multi geometries for the groups instead of the centroids:

julia> result.geometry
4 GeometrySet{2,Float64}
  └─22 MultiNgon{2,Float64}
  └─29 MultiNgon{2,Float64}
  └─33 MultiNgon{2,Float64}
  └─16 MultiNgon{2,Float64}

julia> viz(result.geometry, color = 1:4)

Also notice how the geometry column has a special Domain type, which is lazy in many cases avoiding the construction of expensive geometries in large datasets:

julia> data.geometry
10×10 CartesianGrid{2,Float64}
  minimum: Point(0.0, 0.0)
  maximum: Point(10.0, 10.0)
  spacing: (1.0, 1.0)

Geospatial transform pipelines

We can combine feature transforms from TableTransforms.jl with geometric transforms from Meshes.jl to create truly geospatial pipelines.

For example, we can create a pipeline to standardize the coordinates of the domain, then compute the z-score of the features a and b:

julia> pipe = StdCoords() → Select(:a, :b) → ZScore()
SequentialTransform
├─ StdCoords()
├─ Select([:a, :b], nothing)
└─ ZScore(all)

Applying the pipeline to a CartesianGrid domain leads to a SimpleMesh domain, but the underlying topology is still a GridTopology:

julia> score = data |> pipe
100 SimpleMesh{2,Float64}
  variables (rank 2)
    └─a (Float64)
    └─b (Float64)

julia> topology(domain(data))
10×10 GridTopology(aperiodic, aperiodic)

julia> topology(domain(score))
10×10 GridTopology(aperiodic, aperiodic)

We can visualize both data sets (notice the second in the bottom left corner):

julia> viz(data, variable = :a)

julia> viz!(score, variable = :a, colorscheme = :coolwarm)

DataFrame-like interface

We now support efficient access to “rows” and “columns” of geospatial data over arbitrary domains:

julia> data[1:3,:]
3 View{10×10 CartesianGrid{2,Float64}}
  variables (rank 2)
    └─a (Float64)
    └─b (Float64)
    └─c (Int64)

julia> data[1,:]
(a = 0.4986869004101653, b = 0.9076027666401961, c = 4, geometry = Quadrangle(Point(0.0, 0.0), Point(1.0, 0.0), Point(1.0, 1.0), Point(0.0, 1.0)))

julia> data[:,:a]
100-element Vector{Float64}:
 0.4986869004101653
 0.04423514219396685
 0.7431805912560505
 0.5488350636776107
 0.1781593152690658
 0.34287338202772666
 0.01556466336416984
 0.08127752655318943
 0.2594584424431915
 0.7372470841115343
 0.5219963894546638
 0.2632569123088946
 0.5402482890222606
 0.7302507649502713
 0.30546916759280573
 0.8386743894750512
 0.5462308898202195
 0.5822888075805445
 ⋮
 0.43027481017912594
 0.38445713289998273
 0.9287552353217178
 0.7189205200160556
 0.4372832350782744
 0.8527347769479305
 0.31562905739426095
 0.6731884280856284
 0.5870119105459504
 0.5237663202982149
 0.07350755294901445
 0.7963836858702719
 0.650505284402908
 0.3696365329236847
 0.9032714797251225
 0.48907464522848165
 0.7947529092094066
 0.280343959354032

Automatic selection of color schemes

If you are explicit about your scientific types, the visualization pipeline will suggest color schemes that are friendly to color-blind readers:

julia> scidata = data |> Coerce(:c => Multiclass)
10×10 CartesianGrid{2,Float64}
  variables (rank 2)
    └─a (Float64)
    └─b (Float64)
    └─c (CategoricalArrays.CategoricalValue{Int64, UInt32})

julia> viz(scidata, variable = :c)

And of course, all these features work with general unstructured meshes, point sets, geometry sets, and all other domain types provided by Meshes.jl.

Acknowledgements

Thanks to all contributors (cc: Elias Carvalho (@eliascarv), Gerhard Dorn (@dorn-gerhard), Claro Henrique (@ClaroHenrique), Diana Aldana). We are finally reaching the user experience I dreamed years ago when GeoStats.jl was just a tiny research package.

Would like to support us? Leave a :star: on GitHub if you didn’t already: GitHub - JuliaEarth/GeoStats.jl: An extensible framework for geospatial data science and geostatistical modeling fully written in Julia

Follow our company page on LinkedIn and Twitter for updates and job opportunities:

https://twitter.com/arpeggeotech

https://www.linkedin.com/company/arpeggeo

10 Likes