Area interpolation between Shapefile (country borders) and WKT (multi)polygon elements in a CSV file

I have a shapefile of administrative country borders and a CSV file where one field is “geometry” and embed polygon and multipolygon data as "Well Known Text (WKT). Both are in EPSG:4326.
How can I retrieve for each element in the CSV file, the area under the various countries? I saw there is a package AreaInterpolation.jl, but I don’t know how to get the data in the format required by that package.

This is the CSV file:

Foo,Geometry
aaa,"MULTIPOLYGON (((10.992 11.493, 10.743 11.493, 10.743 11.743, 10.992 11.743, 10.992 11.493)), ((11.242 10.744, 10.992 10.744, 10.992 10.994, 10.743 10.994, 10.743 11.244, 10.992 11.244, 10.992 11.493, 11.242 11.493, 11.242 10.744)))"
bbb,"POLYGON ((11.161 10.799, 11.157 10.799, 11.157 10.803, 11.152 10.803, 11.152 10.808, 11.161 10.808, 11.161 10.812, 11.161 10.821, 11.166 10.821, 11.174 10.821, 11.179 10.821, 11.183 10.821, 11.183 10.816, 11.188 10.816, 11.188 10.812, 11.183 10.812, 11.183 10.808, 11.183 10.803, 11.188 10.803, 11.188 10.794, 11.183 10.794, 11.183 10.79, 11.179 10.79, 11.174 10.79, 11.174 10.794, 11.17 10.794, 11.17 10.799, 11.166 10.799, 11.161 10.799))"
ccc,"POLYGON ((4.497 13.49, 4.497 12.991, 3.997 12.991, 3.997 13.24, 3.498 13.24, 3.498 13.49, 4.497 13.49))"
using WellKnownGeometry, GeoFormatTypes, GeometryOps, DataFrames, Proj


GeometryOps.area.((Geodesic(),), WellKnownText.(DataFrame(csv).Geometry))

I’m not sure how you want your output to look like, as you want the area for each country per your csv polygons, but here’s an example of the total area for each polygon.

This uses GDAL underneath, but we reproject the geometries first to an equal area projection (WGS 84 / NSIDC EASE-Grid 2.0 Global - EPSG:6933), so the area will be in metres squared. Note that this is a bit inefficient, as we do an intersects on all countries three times. For a larger csv you’d want to setup a spatial index first.

using GeoDataFrames
import GeoInterface as GI

countries = GeoDataFrames.read("geoBoundariesCGAZ_ADM0.shp")
countries = reproject(countries, EPSG(4326), EPSG(6933))

# rename Geometry column to wkt in the csv file for automatic detection
# otherwise we need to pass GDAL options specifically
polygons = GeoDataFrames.read("test.csv")
polygons = reproject(polygons, EPSG(4326), EPSG(6933))

polygons.countries_area_km2 = map(polygons.geometry) do geom
    # Find intersecting countries (faster than calculating intersection)
    icountries = countries.geometry[GI.intersects.(countries.geometry, (geom,))]
    # For intersecting countries, calculate the actual intersection
    intersections = GI.intersection.(icountries, (geom,))
    sum(GI.area.(intersections) / 1e6)  # in km²
end

3-element Vector{Float64}:
 3765.9015712420355
    8.600586983524826
 4486.604217175244
1 Like

Isn’t using GeometryOps.area much easier than that?

I though GI.area was deprecated

Why do you reproject both data and administrative borders to EPSG 6933 ? Because otherwise there is a bias in the original EPSG and the computation of the are is not correct ?

Your input data is in degrees (curved polygons on the sphere), while most computations assume planar geometries. So not only would the intersections be slightly wrong, your area results would be in square degrees, which is probably not what you want.

1 Like

Yes, ideally we use GeometryOps here, but it has no geodetic area yet, and I found a bug with intersecting geometries.

It does have geodesic area, just need to load Proj

I stand corrected. That said, while area has a Geodesic trait, intersects/intersection doesn’t AFAIK, so it can be quite wrong (although probably acceptable for the small scale, low latitude example here). Anyway, using GeometryOps is mostly replacing GeoInterface with GeometryOps. Here’s the code, but it fails on the intersections, maybe I’m doing something wrong. I’ll make an issue @asinghvi17

using CSV
using WellKnownGeometry
using GeoFormatTypes
import GeometryOps as GO
import Proj
using GeoDataFrames
import GeoInterface as GI

countries = GeoDataFrames.read(
    "geoBoundariesCGAZ_ADM0.shp",
)
polygons = CSV.read("test.csv", DataFrame)
polygons.geometry =
    GeoFormatTypes.WellKnownText.((GeoFormatTypes.Geom(),), polygons.Geometry)

polygons.countries_area_km2 = map(polygons.geometry) do geom
    icountries = countries.geometry[GO.intersects.(countries.geometry, (geom,))]
    intersections = GO.intersection.(icountries, (geom,))  # fails here, also with setting with `target` kwarg
    sum(GO.area.((Geodesic(),), intersections) / 1e6)  # in km²
end

I have some issues with this script (different package versions ??)

countries = GeoDataFrames.read("adm_boundaries/geoBoundariesCGAZ_ADM0.shp")
countries = reproject(countries, EPSG(4326), EPSG(6933))

This returns me an error ERROR: GDALError (CE_Failure, code 1): PROJ: cea: Invalid latitude

Strange enough, it works if I apply it in a loop:

countries = GeoDataFrames.read("adm_boundaries/geoBoundariesCGAZ_ADM0.shp")
countries.geometry = [reproject(countries[i,"geometry"], EPSG(4326), EPSG(6933)) for i in 1:size(countries,1)]

For the data file, I have the issue that reproject leaves the Geometry field as a (unchanged) text field, likely related to the fact that I have the wkt field with a different name. I solved similarly with a loop and using ArcGDAL.

import ArchGDAL
polygons = GeoDataFrames.read("test.csv")
polygons.geometry = [reproject(ArchGDAL.fromWKT(polygons[i,"Geometry"]), EPSG(4326), EPSG(6933)) for i in 1:size(polygons,1)]

So now I have both countries and polygons geometries as vector of reprojected polygons Vector{GeoInterface.Wrappers.WrapperGeometry{false, false, T, EPSG{1}} where T}

Still, the intersects call fail with error:

julia> polygons.countries_area_km2 = map(polygons.geometry) do geom
           # Find intersecting countries (faster than calculating intersection)
           icountries = countries.geometry[GI.intersects.(countries.geometry, (geom,))]
           # For intersecting countries, calculate the actual intersection
           intersections = GI.intersection.(icountries, (geom,))
           sum(GI.area.(intersections) / 1e6)  # in km²
       end
ERROR: MethodError: no method matching intersects(::PolygonTrait, ::MultiPolygonTrait, ::GeoInterface.Wrappers.Polygon{…}, ::GeoInterface.Wrappers.MultiPolygon{…})
The function `intersects` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  intersects(::Any, ::Any)
   @ GeoInterface ~/.julia/packages/GeoInterface/X1jn3/src/interface.jl:501
  intersects(::Union{CircularStringTrait, CompoundCurveTrait, CurvePolygonTrait, GeometryCollectionTrait, LineStringTrait, LinearRingTrait, MultiLineStringTrait, MultiPointTrait, MultiPolygonTrait, MultiSurfaceTrait, PointTrait, PolygonTrait, PolyhedralSurfaceTrait, TINTrait, TriangleTrait}, ::Union{CircularStringTrait, CompoundCurveTrait, CurvePolygonTrait, GeometryCollectionTrait, LineStringTrait, LinearRingTrait, MultiLineStringTrait, MultiPointTrait, MultiPolygonTrait, MultiSurfaceTrait, PointTrait, PolygonTrait, PolyhedralSurfaceTrait, TINTrait, TriangleTrait}, ::ArchGDAL.AbstractGeometry, ::ArchGDAL.AbstractGeometry)
   @ ArchGDAL ~/.julia/packages/ArchGDAL/N2rQ4/src/geointerface.jl:198

Stacktrace:
  [1] intersects(a::GeoInterface.Wrappers.Polygon{…}, b::GeoInterface.Wrappers.MultiPolygon{…})
    @ GeoInterface ~/.julia/packages/GeoInterface/X1jn3/src/interface.jl:501
  [2] _broadcast_getindex_evalf
    @ ./broadcast.jl:678 [inlined]
  [3] _broadcast_getindex
    @ ./broadcast.jl:651 [inlined]
  [4] getindex
    @ ./broadcast.jl:610 [inlined]
  [5] copy
    @ ./broadcast.jl:911 [inlined]
  [6] materialize(bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast ./broadcast.jl:872
  [7] (::var"#23#24")(geom::GeoInterface.Wrappers.MultiPolygon{false, false, Vector{…}, Nothing, EPSG{…}})
    @ Main ./REPL[6]:3
  [8] iterate
    @ ./generator.jl:48 [inlined]
  [9] _collect(c::Vector{…}, itr::Base.Generator{…}, ::Base.EltypeUnknown, isz::Base.HasShape{…})
    @ Base ./array.jl:811
 [10] collect_similar(cont::Vector{…}, itr::Base.Generator{…})
    @ Base ./array.jl:720
 [11] map(f::Function, A::Vector{GeoInterface.Wrappers.WrapperGeometry{false, false, T, EPSG{1}} where T})
    @ Base ./abstractarray.jl:3371
 [12] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

I can get intersect/intersection working by postponing the reprojection after the intersection (shoudn’t be an issue, right?), but I still have issue computing the area on the reprojected intersection.

My intersection is now a GeoInterface wrapper:

julia> c1_d1_intersection_reprojection = GeoDataFrames.reproject(c1_d1_intersection, GeoDataFrames.EPSG(4326), GeoDataFrames.EPSG(6933))
GeoInterface.Wrappers.MultiPolygon{false, false}([GeoInterface.Wrappers.Polygon([GeoInterface.Wrappers.LinearRing([(1.0365521087353812e6, 1.4566451802948604e6), … (3) … , (1.0365521087353812e6, 1.4566451802948604e6)], crs = "EPSG:6933")], crs = "EPSG:6933"), GeoInterface.Wrappers.Polygon([GeoInterface.Wrappers.LinearRing([(1.0605771925178545e6, 1.3628402165805716e6), … (7) … , (1.0605771925178545e6, 1.3628402165805716e6)], crs = "EPSG:6933")], crs = "EPSG:6933")], crs = "EPSG:6933")

How do I convert this wrapper of a (multi)polygon to an actual (multi)polygon with the “reprojected” coordinates ?

SOLVED using ArcGDAL.reproject instead of GeoDataFrames.reproject (that creates a wrapper).

Reprojecting before or after the intersection provide the same result:

import Proj
import GeoDataFrames
import GeoFormatTypes
import ArchGDAL
import GeoInterface
map_path   = joinpath(@__DIR__,"adm_boundaries","geoBoundariesCGAZ_ADM0.shp")
data_path  = "testdata.csv"
map        = GeoDataFrames.read(map_path)
map        = map[map.shapeType .== "ADM0", :] # filter only countries
data       = GeoDataFrames.read(joinpath(@__DIR__,"testdata.csv"))

# Testing with one geometry
d1 = ArchGDAL.fromWKT(data[1,"Geometry"])
c1 = map[map.shapeGroup .== "NGA","geometry"][1]
c1_d1_intersection = GeoInterface.intersection(c1,d1)
c1_d1_intersection_reprojection = ArchGDAL.reproject(c1_d1_intersection, GeoFormatTypes.EPSG(4326), GeoFormatTypes.EPSG(6933))
a1 = GeoInterface.area(c1_d1_intersection_reprojection) / 1e6 

d1_rep = ArchGDAL.reproject(d1, GeoFormatTypes.EPSG(4326), GeoFormatTypes.EPSG(6933))
c1_rep = ArchGDAL.reproject(c1, GeoFormatTypes.EPSG(4326), GeoFormatTypes.EPSG(6933))
c1_d1_intersection2 = ArchGDAL.intersection(c1,d1)
a2 = GeoInterface.area(c1_d1_intersection2) / 1e6

@assert a1 ≈ a2

1 Like

I’m happy to hear you solved your problem :slight_smile:, and good to see there’s no real difference (in this case) between the two areas when reprojecting before or after.

It can be confusing to use different level of abstractions as you’re doing here. And you’re right about different package versions, I might’ve been on the main branch of GeoDataFrames when I first tried it. I’m not sure about your manual loops, it might be using different methods (given individual elements versus the vector of elements in a DataFrame). Some pointers:

Reading the CSV with the Geometry column (not renamed to wkt as I proposed) can be done with GeoDataFrames like so (using Comma Separated Value (.csv) — GDAL documentation as reference), otherwise the Geometry is not parsed (but stays a String as you noticed).

polygons = GeoDataFrames.read("test.csv", options=["GEOM_POSSIBLE_NAMES=Geometry", "KEEP_GEOM_COLUMNS=NO"])

The invalid latitude error is fixed on the main branch, soon to be released, currently you want to use

countries = reproject(countries, EPSG(4326), EPSG(6933); always_xy=true)  # forces longitude, latitude order

Finally, there’s indeed two ways of doing this, either with (Arch)GDAL (using non-wrapped geometries everywhere though), or using GeometryOps (but I hit an error doing the latter). Mixing them doesn’t really work.

I think we need to stop recommending people use any of these GeoInterface.intersection etc methods, they’re so fragile because only ArchGDAL implemented them.

GeometryOps.intersection (etc) works on anything and is faster.

GeometryOps.reprojectis also better than ArchGDAL.reproject (and I wrote ArchGDAL.reproject)

Its unfortunate that there was a bug in GeometryOps because it really would take a fraction of this code to do the same thing.

1 Like

The key point in “replacing GeoInterface with GeometryOps” is that the GeometryOps code will work on literally any geometry and the GeoInterface code wont.

The coordinates look to be in the CRS you want them to be in, though? The “Wrapper” thing can be a bit misleading but this is a real, instantiated geometry with those coordinates, it’s not lazy.