Spatial join with dataframes

Here is an attempt to use FlexiJoins by first finding a bounding box for each geometry. This reduces the quadratic complexity of the query (depending on how annoying the geometries are):

# prep data & environment
using DataFrames, GeoDataFrames, ArchGDAL, FlexiJoins, IntervalSets

dfa = GeoDataFrames.read("a.csv")
dfb = GeoDataFrames.read("b.csv")

getbb(g) = begin
    gg = ArchGDAL.boundingbox(g)
    gg1 = ArchGDAL.getgeom(gg,0)
    x1, y2 = ArchGDAL.getpoint(gg1,0)
    x2, y1 = ArchGDAL.getpoint(gg1,2)
    (x1..x2, y1..y2)
end

getxy(p) = begin
    x,y = ArchGDAL.getpoint(p,0)
    (x, y)
end
# process data
dfa2 = transform(dfa, 
  :geometry => ByRow(getbb) => [:xinterval, :yinterval])
dfb2 = transform(dfb, 
  :geometry => ByRow(getxy) => [:x, :y])

subset!(
  innerjoin(
    (innerjoin((dfb2,dfa2), by_pred(:x, ∈, :xinterval)),dfa2), 
    by_pred(:y, ∈, :yinterval)
  ), [:geometry, :geometry_1] => (a, b) -> intersects.(a, b))

The results are similar to the ones from a crossjoin. The bounding boxes and intervals may be of further use possibly.

3 Likes