[ANN]: GeoStats.jl v0.14

Dear geospatial community, I am very happy to announce a quite exciting release of GeoStats.jl!

In this release, we’ve introduced a cleaner, unified approach to georeferencing data (e.g. tables and arrays). To demonstrate this feature, I will copy a section of the updated documentation below. Everything is done via the new georef function, which supersedes the old spatial data types.

Tables

Consider a table (e.g. DataFrame) with 25 samples of temperature and precipitation:

table = DataFrame(T=rand(25), P=rand(25))
25Γ—2 DataFrame
β”‚ Row β”‚ T          β”‚ P         β”‚
β”‚     β”‚ Float64    β”‚ Float64   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.0084492  β”‚ 0.515809  β”‚
β”‚ 2   β”‚ 0.681805   β”‚ 0.995751  β”‚
β”‚ 3   β”‚ 0.420886   β”‚ 0.803035  β”‚
β”‚ 4   β”‚ 0.0636994  β”‚ 0.752871  β”‚
β”‚ 5   β”‚ 0.00587529 β”‚ 0.547591  β”‚
β”‚ 6   β”‚ 0.224369   β”‚ 0.193632  β”‚
β”‚ 7   β”‚ 0.618545   β”‚ 0.594453  β”‚
β”‚ 8   β”‚ 0.225633   β”‚ 0.376591  β”‚
β”‚ 9   β”‚ 0.234539   β”‚ 0.505964  β”‚
β”‚ 10  β”‚ 0.375081   β”‚ 0.119859  β”‚
β”‚ 11  β”‚ 0.0907714  β”‚ 0.630515  β”‚
β”‚ 12  β”‚ 0.865484   β”‚ 0.668259  β”‚
β”‚ 13  β”‚ 0.893234   β”‚ 0.544639  β”‚
β”‚ 14  β”‚ 0.574129   β”‚ 0.707787  β”‚
β”‚ 15  β”‚ 0.0179491  β”‚ 0.665882  β”‚
β”‚ 16  β”‚ 0.112314   β”‚ 0.738942  β”‚
β”‚ 17  β”‚ 0.432444   β”‚ 0.24519   β”‚
β”‚ 18  β”‚ 0.600183   β”‚ 0.0963012 β”‚
β”‚ 19  β”‚ 0.0213318  β”‚ 0.380986  β”‚
β”‚ 20  β”‚ 0.161603   β”‚ 0.725747  β”‚
β”‚ 21  β”‚ 0.195581   β”‚ 0.661791  β”‚
β”‚ 22  β”‚ 0.105743   β”‚ 0.726038  β”‚
β”‚ 23  β”‚ 0.893519   β”‚ 0.0609867 β”‚
β”‚ 24  β”‚ 0.540797   β”‚ 0.484521  β”‚
β”‚ 25  β”‚ 0.543858   β”‚ 0.740442  β”‚

We can georeference this table based on a given set of coordinates:

π’Ÿ = georef(table, PointSet(rand(2,25)))

plot(π’Ÿ)

or alternatively, georeference it on a 5x5 regular grid (5x5 = 25 samples):

π’Ÿ = georef(table, RegularGrid(5,5))

plot(π’Ÿ)

In the first case, the PointSet domain type can be omitted, and GeoStats.jl will understand that the matrix passed as the second argument contains the coordinates of a point set:

π’Ÿ = georef(table, rand(2,25))
25 PointSet{Float64,2}
  variables
    └─P (Float64)
    └─T (Float64)

Another common pattern in spatial data sets is when the coordinates of the samples are already part of the table as columns. In this case, we can specify the column names as symbols:

table = DataFrame(T=rand(25), P=rand(25), X=rand(25), Y=rand(25), Z=rand(25))

π’Ÿ = georef(table, (:X,:Y,:Z))

plot(π’Ÿ)

Any table implementing the Tables.jl API is supported, which means that now we are targeting all kinds of tables including very large geospatial databases. :heart: We can georeference these very large tables on memory-free domains such as RegularGrid that are stack-allocated in GeoStats.jl:

@allocated RegularGrid(10^6, 10^6)
0

Arrays

Consider arrays (e.g. images) with data for various spatial variables. We can georeference these arrays using a named tuple:

T, P = rand(5,5), rand(5,5)

π’Ÿ = georef((T=T, P=P), rand(2,25))

plot(π’Ÿ)

Alternatively, we can omit the coordinates and GeoStats.jl will understand that the shape of the arrays should be preserved in a regular grid:

π’Ÿ = georef((T=T, P=P))

plot(π’Ÿ)

Optionally, we can specify the origin and spacing of the grid using keyword arguments:

π’Ÿβ‚ = georef((T=T, P=P), origin=(0.,0.), spacing=(1.,1.))
π’Ÿβ‚‚ = georef((T=T, P=P), origin=(10.,10.), spacing=(2.,2.))

plot(π’Ÿβ‚)
plot!(π’Ÿβ‚‚)

Roadmap

What is coming next? Well, we would like to scale! We want to show the world that Julia can compete in the geospatial business alongside Python and R that already have large user bases. We do believe that we have an advantage here though given the amazing work that is being done by the Julia community to standardize table APIs and to make implementations extremely fast (thanks @bkamins @quinnj and others involved in Tables.jl, DataFrames.jl, CSV.jl, JuliaDB, …). Unlike other communities where β€œgeospatial” tables are re-implemented with various β€œtricks” to get performance, here we can rely on a generic interface and load the appropriate table for the task.

The Tables.jl development is one side of the story. The other side of the story is happening in Meshes.jl where I am trying to standardize basic operations in general meshes to improve the interoperability across different ecosystems operating with spatial data. This initiative is very connected to other initiatives such as GeometryBasics.jl and GeoInterfaceRFC.jl. The ultimate goal is to be able to seamlessly navigate a full scientific pipeline that starts with spatial modeling and analysis (e.g. GeoStats.jl), then physical simulation with spatial data (e.g. Gridap.jl), and finally interactive visualization of the phenomena (e.g. Makie.jl). This won’t be easy to accomplish, but certainly worth to try!

This brings me to my last point, which is: we need help. The codebase is becoming too large to maintain and I am sure many corner cases aren’t covered. Particularly, I am sure that my mental model of Tables.jl is incomplete and that probably various parts of the codebase are assuming a more strict β€œDataFrame-like” API. It would be lovely if people with more experience in tables could stress/test the package and submit PRs fixing issues they encounter with tables other than DataFrames.jl tables. Any other form of help is welcome and highly appreciated.

Finally, I would like to thank the community for the amazing feedback and help so far during these years of development in Julia. Thank you @oxinabox for the Julia anti-patterns post, that helped me a lot in this release. Thank you again @bkamins @quinnj for the amazing work in the Julia tables ecosystem, thank you @evetion @visr for helping with the project and fixing my Project.toml mistakes :slight_smile: Thank you @briochemc for raising issues when you find them, we need more of those, and thank you @jheinen for always taking action with plotting issues :heart:

22 Likes