JuliaGeo: Combine 2 files with points into one and export to shapefile

Hello friends,

I’ve been trying to combine 2 shapefiles that have point data.

I have put in an issue with GeoDataFrames here but I’ll detail my issues here.

Initially I tried with a shapefile and a CSV file, but now I’ve tried with 2 shapefiles, one I created from finding the centroids of polygons and the other is form a different dataset:

import GeoDataFrames as GDF

centroid_df = GDF.read("/home/my-user/centroid.shp")
point_df = GDF.read("/home/my-user/point.shp")

combined_df = vcat(centroid_df, point_df, cols=:union)

This works great. Now comes trying to turn this combined dataframe into a new shapefile.

# with code above

GDF.write("/home/my-user/combined.shp", combined_df)

Great. Now if I read this combined shapefile to check everything is okay, all of the felds that had Cyrillic or Mandarian characters in them have been transformed into ?????????. If I write to a CSV file using GeoDataFrames, the same characters are now Latin-1 characters.

Okay, well, I can try using ArchGDAL to get the latitude and longitude of the Geometries in the GDF geometry column. The closest methods I can find are gety and getx:

# Just to see if the one of the first items and later items are sane
# There's about 700,000 rows.

import ArchGDAL as AG

AG.getx(combined_df[!,:geometry][5], 0)
# -> 3.253345042199999e6

AG.gety(combined_df[!,:geometry][5], 0)
# -> 7.0101121679000035e6

AG.getx(combined_df[!,:geometry][600000], 0)
# -> -1.5517223299039526

AG.gety(combined_df[!,:geometry][600000], 0)
# -> 47.086016784843764

Hmm, okay. Unfortunately, these numbers don’t translate well to a CSV or Excel sheet.

using CSV

combined_df[!,:latitude] =  collect(Iterators.map(pt -> AG.gety(pt, 0), combined_df[!,:geometry]))
combined_df[!,:longitude] = collect(Iterators.map(pt -> AG.getx(pt, 0), combined_df[!,:geometry]))

select!(combined_df, Not(:geometry))

CSV.write("/home/my-user/combined_points.csv", combined_df, bom=true)

So the characters are correct now, but now the some of the numbers are represented as 7.0101121679000035E+6 in the CSV file.

Any suggestions?

This looks like an issue with GDAL, specifically its treatment of character encodings in Shapefiles. I replied to the issue you filed with GeoDataFrames as well, but can you confirm that the Cyrillic/Mandarin strings in centroid_df and point_df are loaded correctly?

Seems to:

import GeoDataFrames as GDF

centroid_df = GDF.read("/home/my-user/Downloads/FindCentroidsOutput.shp")

centroid_df[!,:name]

# A few of the Cyrillic names

586069-element Vector{Union{Missing, String}}:
 "Ленинградская"
 "Кустанайская"
 "Выборгская"
 "Кирилловская"

point_df = GDF.read("/home/my-user/Downloads/point.shp")

point_df[!,:name]

# a Cyrillic name from this dataframe

129943-element Vector{Union{Missing, String}}:
"КТП-130Г"

I’d have to figure out a way to grab some Cyrillic and Mandarin characters, as you can see there’s quite a bit of data in both. Let me try the example you gave on Github.

Responded to your reply on my Github issue.

It seems like whatever GeoDataFrames.write uses for writing to a CSV file doesn’t allow for adding the UTF-8 BOM header, like CSV.write does.

I imagine this is similar for writing to shapefiles, with whatever is required to write strings as UTF-8 compatible for shapefiles.