I loaded a Shapefile in Julia, transformed it to a datafame and then did some data cleaning, getting rid of some unneeded rows of data. I saved the shapefile again, with a new name, using “Shapefile.write” and tried to open the file in ArcGIS Pro. The layer did open, but it didn’t display on the map, saying that it didn’t have a coordinate system.
I tried to solve that in Julia, and I’m assuming that the transformation to Dataframe might have stripped that information and I need to recover it, but I don’t know how.
So, I’m guessing I have two options:
Is it possible to delete the rows of data without transforming the shapefile to a Dataframe? I’m basically a data analyst and I’m very familiar with DF, so that’s why I did it, but since the Shapefile seems compatible with the “Tables.jl” interface, I might be able to skip the transformation in some way.
Or is it possible to add a coordinate system to the file when I’m saving the dataframe as a shapefile? And if so, how do I know I’m using the correct one?
I’m sure this might be a fairly easy problem, but I’m new to GIS and I’m not still very familiar with all the ins and outs of shapefiles.
using DataFrames, Shapefile
table = Shapefile.Table(“./Statewide_CA_PLSS_NAD83AlbersCA_20240718.shp”);
plss = DataFrame(table)
# Filter data where column SECTION has “OE”, “TA” or “EA” (Sections from the Delta, Tahoe and Salton Sea)
subset!(plss, :SECTION => ByRow(x → x ∉ [“EA”, “OE”, “TA”]));
# Save file as shapefile, with a different name, on the same folder
Shapefile.write(“plss_modified.shp”, plss)
You may want to try GMT.jl. When it reads a shapefile (D = gmtread("file.shp")) data is returned a simple tabular type (a GMTdataset) (or a vector of them). Numeric data is stored in the matrix field data so you can do your matrix manipulations and at the save back to .shp with gmtwrite("filenmae.shp", D).
Thanks @alex-s-gardner but not quite there yet, I’m probably missing something. The files opened in ArcGIS without coordinates again, I might have to dig some more. I do like that with GeoDatFrames.jl I get the DF automatically and can do the manipulations with the same syntax.
@alejandromerchan the problem is that .shp files do not require CRS information in their original specification. Later on, people started to include .prj files alongside the .shp, which include a CRS string.
Currently, Shapefile.jl assumes that this string is a ESRI WKT1 string when it encounters a .prj file (even though that is not necessarily the case), and it is probably not writing the .prj file in the write function for some reason?
I suggest that you open an issue in Shapefile.jl if that is the case.
using Shapefile
using Downloads: download
using GeoDataFrames
using ArchGDAL
using DataFrames
#https://juliageo.org/GeometryOps.jl/dev/tutorials/creating_geometry
# paths 2 files
url = "https://calpip.cdpr.ca.gov/content/groundwater/shapefiles/Statewide_CA_PLSS_NAD83AlbersCA_20240718.zip"
local_zip = "/Users/gardnera/Downloads/junk.zip";
path2shp = "/Users/gardnera/Downloads/junk/Statewide_CA_PLSS_NAD83AlbersCA_20240718.shp"
path2shp_new = replace(path2shp, ".shp" => "_new.shp")
# download and unzip
download(url, local_zip)
run(`unzip $local_zip`)
df = GeoDataFrames.read(path2shp)
df = subset!(df, "SECTION" => ByRow(x -> x ∉ ["EA", "OE", "TA"]));
# there is one geometry that does not match the others
same_geometry_index = typeof(df.geometry[1]) .== typeof.(df.geometry)
Shapefile.write(path2shp_new , df[same_geometry_index,:];force=true)
Thanks. I think I missed understanding that the OP wanted those NOT in ["OE", "TA", "EA"]. I don’t have that implemented in GMT.jl, but it should not be difficult to do.
Yes, @juliohm, the problem is somewhere there. I took the original .prj file, renamed it, moved it to the folder with my modified .shp file and it all worked. I’ll file an issue with Shapefile.jl.
We should probably embed the crs in all the Shapefile.jl geometry objects so it can’t get lost.
Not difficult to do.
The other discussed alternative is DataAPI.metadata returning the crs fir a DataFrame. But it won’t work for e.g. a NamedTuple table, so embedding the crs in Polygon and other geoms is probably the best option.