I’m trying to perform a spatial join in Julia between two files that aren’t that large. One is the California PLSS sections shapefile, which is publicly available here and is 164627 x 17 after some cleaning. The other I cannot share publicly, but it’s a biodiversity database that is 99316 x 42.
I perform the spatial join as mentioned in this tutorial as:
And then REPL dies and I don’t get any error message. This happens both in VS Code and directly in the REPL. Any idea what’s going on? Could this be an OOM problem?
I also tried using GeoStats.jl, but I run into issues with the projection when trying to open the shapefiles.
My goal in to have a script that performs an analysis with these two files without opening them in ArcGIS. I managed to do it in R, but I’d like to do the same in Julia, but this part has been the bottleneck. I’ll appreciate any help.
Please feel free to start a separate thread and we can try to help. Adding projections is usually easy. There is only one type of projection not covered yet that we are planning to add soon.
I don’t mind having the conversation here. The error I’m getting for both my files after running plss = GeoIO.load("./PLSS/Statewide_CA_PLSS_NAD83AlbersCA_20240718.shp")
is
ERROR: ArgumentError: EPSG/ESRI code for the ESRI ID "NAD_1983_California_Teale_Albers" not found in dictionary. Please check https://github.com/JuliaEarth/CoordRefSystems.jl/blob/main/src/strings.jl If you know the EPSG/ESRI code of a given ESRI WKT string, please submit a pull request.
I checked the dictionary and there an "NAD_1983_Contiguous_USA_Albers" => EPSG{5070},, which I believe should work, but I’m guessing that you have to provide the exact name.
I’d like to make this work, but I also wonder if there’s a way to use the GeoDataFrame that I already have for the GeoTables.geojoin operation.
So none of the two results match the string in your file. Let’s try the official EPSG database at Geodetic Database. If we search for “California Albers”, we get the following result:
If you export the ESRI WKT you will see the same string stored in your file.
So the next step to get things loaded correctly is to submit a PR to CoordRefSystems.jl adding the pair "NAD_1983_California_Teale_Albers" => EPSG{3310} to the dictionary.
I’m sorry to hear your REPL dies on the operation.
It does seem like an OOM problem, your OS killing the Julia process for taking too much memory. What OS and how much memory do you have? Depending on the OS you should be able to find this in the logs.
In order to help fix the problem, can you share a bit on: what kind of geometry your biodiversity dataset has (points, polygons, etc.), the code used to read the data (I assume GeoDataFrames), and finally the file format of the biodiversity dataset?
Doing 164627 * 99316 = 16.350.095.132 touch operations (assuming the brute force way without a spatial index) will always require some memory, but much depends on the geometry sizes and geometry format. And I assume you really want to use touches instead of intersects?
@alejandromerchan good to just make a github issue if you hit a problem like this with GeometryOps.jl, that way it will get fixed and we are more likely to get direct notifications and respond quickly.
Sorry for the late reply, I don’t check Discourse during the weekend. I’ll try to create a MWE and create an issue. As I said, I cannot share one of the files, but I’ll see if I can remove the sensitive information, I can still reproduce the problem.