Read Geopackage Layers to dataframe

Hi everyone,
I started applying ArchGDAL to read geopackage layers. How can one read geopackage layers to dataframes (like geopandas in python)?
Motivation: I’d like to avoid for loops and many if/else statements in the loop because I think the codes are more readable and compact with dataframe filters.

2 Likes

For this to become easy, we’d probably need to add Tables.jl support to ArchGDAL.jl first (Update from DataStreams.jl to Tables.jl Β· Issue #69 Β· yeesian/ArchGDAL.jl Β· GitHub). However looking at the old DataStreams support code in ArchGDAL, we can relatively easily hack something together:

using ArchGDAL
const AG = ArchGDAL
using DataFrames

dataset = AG.read("point.geojson")
layer = AG.getlayer(dataset, 0)

nfeat = AG.nfeature(layer)
nfield = AG.nfield(layer)

# prepare Dict with empty vectors of the right type for each field
d = Dict{String, Vector}()
featuredefn = AG.layerdefn(layer)
for field_no in 0:nfield-1
    field = AG.getfielddefn(featuredefn, field_no)
    name = AG.getname(field)
    typ = AG._FIELDTYPE[AG.gettype(field)]
    d[name] = typ[]
end
d["geometry"] = AG.IGeometry[]

# loop over the features to fill the vectors in the Dict
for fid in 0:nfeat-1
    AG.getfeature(layer, fid) do feature
        for (k, v) in pairs(d)
            if k == "geometry"
                val = AG.getgeom(feature, 0)
            else
                val = AG.getfield(feature, k)
            end
            push!(v, val)
        end
    end
end

# construct a DataFrame from the Dict
df = DataFrame(d)
println(df)

Which gives:

4Γ—3 DataFrame
β”‚ Row β”‚ FID     β”‚ geometry                          β”‚ pointname β”‚
β”‚     β”‚ Float64 β”‚ ArchGDAL.IGeometry                β”‚ String    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2.0     β”‚ Geometry: POINT (100 0)           β”‚ point-a   β”‚
β”‚ 2   β”‚ 3.0     β”‚ Geometry: POINT (100.2785 0.0893) β”‚ point-b   β”‚
β”‚ 3   β”‚ 0.0     β”‚ Geometry: POINT (100 0)           β”‚ a         β”‚
β”‚ 4   β”‚ 3.0     β”‚ Geometry: POINT (100.2785 0.0893) β”‚ b         β”‚

This is assuming you want to keep the geometries as GDAL objects. If you want you can also first convert the geometries to a different representation, depending on your needs.

Note that this is just a DataFrame with a geometry column, and you don’t get special geospatial behavior (spatial joins etc.) with it, as in geopandas. That said, this is something I’d like to work towards. There is GeoDataFrames.jl which could be revived. But I also shared some thoughts on it here: JuliaGeo talk at FOSS4G 2019 - #8 by visr and in the README of GeoJSONTables.jl. Finishing up GeoInterfaceRFC.jl will also help here. Plenty of work to do, any help is more than welcome :slight_smile:

2 Likes