[ANN] Announcing GeoDataFrames.jl

I’d like to announce GeoDataFrames.jl, a Table interface for reading and writing common vector formats, such as shapefiles and geopackages. It’s quite small, as the majority comes from the work in ArchGDAL.jl and GDAL.jl on which this package depends.

The main motivation came from the need to quickly read or write some geo-enabled formats. This was already possible with ArchGDAL, but wasn’t as straightforward as it could be. This was visible here ([0,1]) and on Github ([2,3,4]). The package is still quite young and rough, but works for me (albeit simple) production work.

An example:

Writing:

using GeoDataFrames; const GDF=GeoDataFrames
using DataFrames

coords = zip(rand(10), rand(10))
df = DataFrame(geom=createpoint.(coords), name="test");
GDF.write("test_points.shp", df)

Reading:

df = GDF.read("test_points.shp")
10×2 DataFrame
 Row │ name    geom
     │ String  IGeometr…
─────┼───────────────────────────────────────────
   1 │ test    Geometry: POINT (0.2360926400353…
   2 │ test    Geometry: POINT (0.2445619453460…
   3 │ test    Geometry: POINT (0.4504663468371…
   4 │ test    Geometry: POINT (0.0886989855586…
   5 │ test    Geometry: POINT (0.0323344938606…
   6 │ test    Geometry: POINT (0.1574232985696…
   7 │ test    Geometry: POINT (0.9677152948776…
   8 │ test    Geometry: POINT (0.0047328946715…
   9 │ test    Geometry: POINT (0.7389241862917…
  10 │ test    Geometry: POINT (0.1207370929831…

This package may eventually become obsolete as more of the code is upstreamed into ArchGDAL.

[0] ArchGDAL.getfield() error
[1] How to create a new shapefile containing a few points - #10 by rio
[2] Construct ArchGDAL. IFeatureLayer via named tuples or DataFrame · Issue #152 · yeesian/ArchGDAL.jl · GitHub
[3] Documentation [question] writing a table to a layer · Issue #153 · yeesian/ArchGDAL.jl · GitHub
[4] Failed to create a Shapefile following the GDAL Vector API tutorial · Issue #147 · yeesian/ArchGDAL.jl · GitHub

14 Likes

Thank you @evetion for the great work as usual. Just want to point out that the work that is recently happening around Meshes.jl and GeoStats.jl may potentially lead to a cleaner approach with general Tables.jl tables.

Consider a table following the Tables.jl API:

julia> using Tables

julia> table = (a=rand(100), b=rand(100)) # a Tables.jl table
(a = [0.307732211237423, 0.803718835997471, 0.08411713448982439, 0.4830139406337839, 0.89972361406729, 0.4838048624166984, 0.6572940355048909, 0.19799338569165803, 0.7805574255598815, 0.8990875618632652  …  0.07242337288301681, 0.7676977890604322, 0.8262285634679147, 0.03258811191692268, 0.5968713113063921, 0.2685941981073232, 0.4662204960862124, 0.5170359078991946, 0.5012109114845784, 0.11183068325228596], b = [0.9965173138643892, 0.11553726662208041, 0.20159410317223725, 0.7313178905382756, 0.557382816267495, 0.2856813840254393, 0.5016940394671034, 0.7547849171693573, 0.1820419942447178, 0.49498420120152087  …  0.003864021720924482, 0.3285419806520167, 0.26864603835206635, 0.23492724473816962, 0.011024989056165335, 0.8322119339786553, 0.5203882014765839, 0.6062219376724753, 0.8817101294035377, 0.6451100662465474])

We can georeference this table on various kinds of Domain defined in Meshes.jl, e.g. PointSet, CartesianGrid, GeometrySet, etc. For example, we can georeference on a PointSet:

julia> using GeoStats.jl # v0.23

julia> 𝒟 = georef(table, PointSet(rand(Point3, 100)))
100 GeoData{3,Float64}
  variables
    └─a (Float64)
    └─b (Float64)
  domain: 100 PointSet{3,Float64}

This object implements the Meshes.jl Data trait, and consequently the Tables.jl API:

julia> first(Tables.rows(𝒟), 3)
3-element Vector{NamedTuple{(:a, :b, :geometry), Tuple{Float64, Float64, Point3}}}:
 (a = 0.307732211237423, b = 0.9965173138643892, geometry = Point(0.9489465238556838, 0.6840987940920451, 0.09563305979228809))
 (a = 0.803718835997471, b = 0.11553726662208041, geometry = Point(0.055798761161550026, 0.13459614756211802, 0.06163990143241693))
 (a = 0.08411713448982439, b = 0.20159410317223725, geometry = Point(0.39791182546602677, 0.47022283912414164, 0.6568206304457964))

where an additional column called geometry has been generated on the fly with the points in the PointSet. The magic starts when we want to georeference data on large meshes. For example, we can create a large CartesianGrid without allocating any memory:

julia> grid = CartesianGrid(10, 10)
10×10 CartesianGrid{2,Float64}
  minimum: Point(0.0, 0.0)
  maximum: Point(10.0, 10.0)
  spacing: (1.0, 1.0)

and in this case the georerence operation is just a lazy wrapper:

julia> 𝒟 = georef(table, grid)
100 GeoData{2,Float64}
  variables
    └─a (Float64)
    └─b (Float64)
  domain: 10×10 CartesianGrid{2,Float64}

However, if we want to treat this data set as table we can:

julia> first(Tables.rows(𝒟), 3)
3-element Vector{NamedTuple{(:a, :b, :geometry), Tuple{Float64, Float64, Quadrangle{2, Float64, Vector{Point2}}}}}:
 (a = 0.307732211237423, b = 0.9965173138643892, geometry = Quadrangle(Point(0.0, 0.0), Point(1.0, 0.0), Point(1.0, 1.0), Point(0.0, 1.0)))
 (a = 0.803718835997471, b = 0.11553726662208041, geometry = Quadrangle(Point(1.0, 0.0), Point(2.0, 0.0), Point(2.0, 1.0), Point(1.0, 1.0)))
 (a = 0.08411713448982439, b = 0.20159410317223725, geometry = Quadrangle(Point(2.0, 0.0), Point(3.0, 0.0), Point(3.0, 1.0), Point(2.0, 1.0)))

Notice how the geometry column now has Quadrangle from Meshes.jl built on the fly. This is a much needed feature for large-scale geostatistics on unstructured meshes.

So moving forward, if you plan to interop with the GeoStats.jl stack for statistical work, I suggest to make your GeoDataFrame type a concrete implementation of the Meshes.jl Data trait. This is all that is needed to take advantage of the full stack of geospatial estimation, simulation, learning methods we are building. Additionally, if you implement this trait, you will get plot recipes for free, which will become Makie.jl recipes at some point.

3 Likes

Hi Júlio, thanks for the showcase. I really like the pure Julia geometry approach. Do you use robust predicates in Meshes.jl, like GitHub - lairez/ExactPredicates.jl: Fast and exact geometrical predicates in the Euclidean plane?

With regards to interfacing, I agree that traits are the way to go. GeoDataFrames makes direct use of (Arch)GDAL geometries, so it would be good to start there. I think that the long overdue rewrite of GeoInterface is the correct way to go from ArchGDAL geometries to the types Point or Quadrangle in Meshes and vica-versa. On top of that we can implement the Data interface so the attributes come along as well.

We have many robust algorithms implemented in Meshes.jl already but some of the predicates are still on the TODO list.

The plan you wrote above is perfect. I will be working on other fronts meanwhile like the MeshIO.jl package to load more meshes directly in other engineering formats not in GIS.