Spatial Join with GeometryOps is killing my Julia session

I’m trying to perform a spatial join in Julia between two files that aren’t that large. One is the California PLSS sections shapefile, which is publicly available here and is 164627 x 17 after some cleaning. The other I cannot share publicly, but it’s a biodiversity database that is 99316 x 42.
I perform the spatial join as mentioned in this tutorial as:

spatial_join = innerjoin((plss, biodiversity), by_pred(:geometry, GeometryOps.touches, :geometry))

And then REPL dies and I don’t get any error message. This happens both in VS Code and directly in the REPL. Any idea what’s going on? Could this be an OOM problem?

I also tried using GeoStats.jl, but I run into issues with the projection when trying to open the shapefiles.

My goal in to have a script that performs an analysis with these two files without opening them in ArcGIS. I managed to do it in R, but I’d like to do the same in Julia, but this part has been the bottleneck. I’ll appreciate any help.

Hi @alejandromerchan,

Please feel free to start a separate thread and we can try to help. Adding projections is usually easy. There is only one type of projection not covered yet that we are planning to add soon.

I don’t mind having the conversation here. The error I’m getting for both my files after running
plss = GeoIO.load("./PLSS/Statewide_CA_PLSS_NAD83AlbersCA_20240718.shp")

is

ERROR: ArgumentError: EPSG/ESRI code for the ESRI ID "NAD_1983_California_Teale_Albers" not found in dictionary. Please check https://github.com/JuliaEarth/CoordRefSystems.jl/blob/main/src/strings.jl If you know the EPSG/ESRI code of a given ESRI WKT string, please submit a pull request.

I checked the dictionary and there an "NAD_1983_Contiguous_USA_Albers" => EPSG{5070},, which I believe should work, but I’m guessing that you have to provide the exact name.

I’d like to make this work, but I also wonder if there’s a way to use the GeoDataFrame that I already have for the GeoTables.geojoin operation.

Whenever you encounter an error like this, go to epsg.io and copy/paste the ID:

There are two results as you can see:

If you click on them, and scroll down, you will see that you can choose to export the ESRI WKT. In this first result, the string is

"NAD_1983_California_Teale_Albers_FtUS"

In the second result, the string is

"NAD_1983_California_Teale_Albers_Ft_Intl"

So none of the two results match the string in your file. Let’s try the official EPSG database at Geodetic Database. If we search for “California Albers”, we get the following result:

https://epsg.org/crs_3310/NAD83-California-Albers.html

If that is the CRS of your data, then you have the EPSG code 3310.

After you confirm which of these is your CRS, we can help with next steps.

I can confirm that your CRS is NAD83 / California Albers - EPSG:3310

If you export the ESRI WKT you will see the same string stored in your file.

So the next step to get things loaded correctly is to submit a PR to CoordRefSystems.jl adding the pair "NAD_1983_California_Teale_Albers" => EPSG{3310} to the dictionary.

1 Like

Besed on the .proj file from one of my files, the last one has the right parameters, except the units are meters.

PROJCS[“NAD_1983_California_Teale_Albers”,
GEOGCS[“GCS_North_American_1983”,
DATUM[“D_North_American_1983”,
SPHEROID[“GRS_1980”,6378137.0,298.257222101]],
PRIMEM[“Greenwich”,0.0],
UNIT[“Degree”,0.0174532925199433]],
PROJECTION[“Albers”],
PARAMETER[“False_Easting”,0.0],
PARAMETER[“False_Northing”,-4000000.0],
PARAMETER[“Central_Meridian”,-120.0],
PARAMETER[“Standard_Parallel_1”,34.0],
PARAMETER[“Standard_Parallel_2”,40.5],
PARAMETER[“Latitude_Of_Origin”,0.0],
UNIT[“Meter”,1.0]]

Added the entry to the dictionary here:

1 Like

The units of the EPSG:3310 are meters:

1 Like

The last ingredient to make things work is to map the newly added EPSG{3310} to an actual CRS type. This is what I did here:

I will merge the PR and release a patch so that you can load your files.

You should be able to update your environment after the version is merged:

Please let us know if something else is not working. Preferably in a separate thread.

Thanks a lot. I’ll let you know how it goes.

1 Like

I’m sorry to hear your REPL dies on the operation.

It does seem like an OOM problem, your OS killing the Julia process for taking too much memory. What OS and how much memory do you have? Depending on the OS you should be able to find this in the logs.

In order to help fix the problem, can you share a bit on: what kind of geometry your biodiversity dataset has (points, polygons, etc.), the code used to read the data (I assume GeoDataFrames), and finally the file format of the biodiversity dataset?

Doing 164627 * 99316 = 16.350.095.132 touch operations (assuming the brute force way without a spatial index) will always require some memory, but much depends on the geometry sizes and geometry format. And I assume you really want to use touches instead of intersects?

@alejandromerchan good to just make a github issue if you hit a problem like this with GeometryOps.jl, that way it will get fixed and we are more likely to get direct notifications and respond quickly.

Hmm, I have noticed that FlexiJoins.jl consumes quite a bit of memory, significantly beyond normal DataFrames joins. @aplavin, any thoughts?

Not sure what the best thing to do is here, although you can always manually join with the predicate. But that loses the tree optimizations we have.

1 Like

Do you have an MWE? Best if that doesn’t require GeometryOps understanding (:

Unfortunately we don’t even have a GeometryOps.jl MWE here

@alejandromerchan any chance you could put some MWEs together that trigger this error?

Sorry for the late reply, I don’t check Discourse during the weekend. I’ll try to create a MWE and create an issue. As I said, I cannot share one of the files, but I’ll see if I can remove the sensitive information, I can still reproduce the problem.

1 Like