GMT.jl can do this pretty quickly. When one reads a shape file (or other OGR formats) we get a vector of GMTdataset that has, among others, a boundingbox field for each polygon. So we can do the quick scan through to see if points are, or not, inside each of the polygon’s BB and if yes than use the gmtselect
module that gets us the point-in-polygon answer.
using GMT
D = gmtread("wwf_ecos/wwf_terr_ecos.shp");
This file has islands (holes in polygons).
Use `gmtread(..., no_islands=true)` to ignore them.
and we can see that there are 14841 polygons in that shp file
length(D)
14841
Now, this little function reports if a point falls inside any of those polygons
function inecos(D, lon, lat)
iswithin(bbox, lon, lat) = (lon >= bbox[1] && lon <= bbox[2] && lat >= bbox[3] && lat <= bbox[4])
for k = 1:length(D)
!iswithin(D[k].bbox, lon, lat) && continue
r = gmtselect([lon lat], polygon=D[k])
if (!isempty(r))
println("Point falls inside polygon $(k)")
break
end
end
nothing
end
Taking a point that falls inside the last polygon we see that it takes ~2 milli sec in my computer
@time inecos(D, 178.8, 51.6)
Point falls inside polygon 14841
0.002249 seconds (316 allocations: 25.219 KiB)
So you can process ~1000 points in 2 seconds. 10x faster than your R solution.
And what polygon did it fall into?
D[14841].attrib
Dict{String, String} with 21 entries:
"REALM" => "NA"
"ECO_ID" => "51102.00000000000"
"G200_NUM" => "0.00000000000"
"ECO_NAME" => "Aleutian Islands tundra"
"Shape_Area" => "0.03981285655"
"G200_BIOME" => "0.00000000000"
"G200_STAT" => "0.00000000000"
"PER_area" => "0.00000000000"
"GBL_STAT" => "3.00000000000"
"AREA" => "307.56219168200"
"BIOME" => "11.00000000000"
"OBJECTID" => "14923"
"ECO_NUM" => "2.00000000000"
"area_km2" => "12141"
"PERIMETER" => "2.05118966103"
"PER_area_2" => "0.00000000000"
"PER_area_1" => "0.00000000000"
"eco_code" => "NA1102"
"G200_REGIO" => ""
...
BUT, I had to fix an issue with the gmtselect
module and the fix is only in the GMT.jl master version (until I make a new release ofc).