Spatial join with dataframes

That’s a very nice example! Might I suggest to use GeoInterface for most of these operations? In that way, this would also work on Shapefile geometries, etc.

GeoInterface.x(point)  # to get the x coordinate, which is a shortcut for:
GeoInterface.getcoord(point, 1)  # for x, or without the 1 for a generator

GeoInterface.extent(geom)  # Gives you an Extents.Extent object, which is a NamedTuple
Extent(X = (1.89038215436677e6, 1.89038215436677e6), Y = (5.50127974222017e6, 5.50127974222017e6))

Well you’re reading about the cutting edge right here. :wink: So not very far I’m afraid, but given that we have great Table like packages, geometry packages and spatial index packages, it seems we should be pretty close, we just need to find a way for them to work together.

2 Likes

Yes sure. In my idea the linked scripts were potentially usable as a predicate from flexijoins.
In this sense I was addressing @aplavin , who asked for the use of efficient spatial search algorithms to be employed within flexijoins
In particular, in my quick reading, this part seemed more interesting to me, which is perhaps better than O(n^2).

“GMT.jl can do this pretty quickly. When one reads a shape file (or other OGR formats) we get a vector of GMTdataset that has, among others, a boundingbox field for each polygon. So we can do the quick scan through to see if points are, or not, inside each of the polygon’s BB and if yes than use the gmtselect module that gets us the point-in-polygon answer.”

No, and I’m not sure if this argument is really needed - so maybe it’s just a temporary workaround, in the future it won’t be required.

The ideal way would be to utilize columns that are AcceleratedArrays automatically, for the operations they support. At first, it could require manual mode selection, as you say.

Currently, reusing data preparation (like sort/tree building/hashing) in multiple joins with the same dataset is nontrivial and undocumented - through possible, and I use this functionality from time to time. AcceleratedArrays can make this cleaner and more straightforward (also more explicit?).