How can I rasterize the polygons of a Shapefile in Julia?

Just to explain the title for those unfamiliar: a Shapefile is a common file format for vectorized geospatial data. I wonder if someone can help me figure out how to rasterize Shapefiles in Julia?

There are several relevant packages: GDAL.jl is the most promising since it includes a function GDAL.rasterize() that should do exactly what I want. However, I can’t seem to decipher the C-syntax of the arguments and probably need to see it in use. Other possible options include ArchGDAL.jl and Shapefile.jl but the path forward with these is even less clear to me.

Here’s a sample smallish Shapefile from GADM.org if someone wants to give it a try (4 MB download):

It contains three shapefiles of Argentina with polygons representing the entire country (gadm36_ARG_0.shp) and two levels of more detailed subregions (gadm36_ARG_1.shp and gadm36_ARG_2.shp). It would be extremely helpful if someone could show me how to produce a raster representation of the country-level file, or at least lead me in the right direction. In my actual application I need to represent all the subregions with their own integer values, but hopefully I can figure out that step myself. (Also the shapefile is global and over 1 GB in size.)

1 Like

Yes so GDAL.jl indeed wraps the rasterize function which you can use here. ArchGDAL.jl does not cover it yet, so that means that we have to use GDAL.jl directly.

Shapefile.jl won’t be of much use here since we rely on GDAL, we might as well use GDAL to read the shapefile.

I did not go through the example you gave, but this is an example I lifted from the GDAL.jl tests. It might be best to first find the right options you want here: https://www.gdal.org/gdal_rasterize.html. And perhaps try it out first using the command line tool. Once that is working well you should be able to put in those options below. It should work the same except you won’t need to do a system call and can do it directly in GDAL from julia.

using GDAL

# register formats
GDAL.allregister()

# open a vector file
ds_point = GDAL.openex("data/point.geojson", GDAL.GDAL_OF_VECTOR, C_NULL, C_NULL, C_NULL)

# create a rasterize options object
options = GDAL.rasterizeoptionsnew(["-of","MEM","-tr","0.05","0.05"], C_NULL)
# rasterize the vector, in this case to an in memory raster (https://www.gdal.org/frmt_mem.html)
ds_rasterize = GDAL.rasterize("data/point-rasterize.mem", Ptr{GDAL.GDALDatasetH}(C_NULL), ds_point, options, C_NULL)
# tell GDAL to free the rasterize options object
GDAL.rasterizeoptionsfree(options)

# close the datasets
GDAL.close(ds_rasterize)
GDAL.close(ds_point)

2 Likes

You can do it all with GMT.jl but when using shapefiles you need to convert them first to a .gmt format with the GDAL ogr2ogr program (GMT does not yet read shapes directly). However, for current question no need for that as GMT has conveted the GDAM to a very compressed netCDF file. So plotting Argentina and overlaying Tierra del Fuego with a different polygon color take as much as

coast(DCW="AR+p0.25p+gblue", frame=:a)
coast!(DCW="AR.V+p0.25p+ggreen", fmt=:png, show=true)


and the result is. To plot more is a bit of the same but ofc not so simple.

1 Like

To follow-up from @visr’s example, the following should work if you have GDAL.jl and ArchGDAL.jl installed:

import GDAL
using ArchGDAL; const AG = ArchGDAL

AG.registerdrivers() do
GDAL.close(GDAL.rasterize(
Ptr{GDAL.GDALDatasetH}(C_NULL),
source_ds.ptr,
GDAL.rasterizeoptionsnew([
# "-of", "GTiff",         # The format/driver. Inferred from filename here.
# "-l", "gadm36_ARG_1",   # The layer(s) used for input features.
"-ts", "250", "250",      # target size: "-ts xsize ysize". Set it to whatever you want.
"-ot", "Byte"
], C_NULL), C_NULL))
end
end


I have left some of the optional arguments as comments in the example above. Feel free to explore the other options that you can pass to GDAL.rasterizeoptionsnew() in the format above.

I have also opened an issue to provide better support (and eventually documentation) for it in ArchGDAL.jl.

3 Likes

Thank you very much, all three of you! By peeking at your examples I easily managed to rasterize my Argentina demo shapefile at a country level. This works brilliantly in both GDAL and ArchGDAL. (GMT looks very promising too but I had some install problems, see the separate post.)

However, I completely hit the wall when trying to rasterize the subregions of Argentina. It took three hours before I had a working solution. The problem was that the demo shapefile lacked an attribute that contained a numerical index that I could use to burn into the raster. The index field GID_1 contained strings of the format "ARG.24_1". So I needed to brush up on some long forgotten SQL knowledge to finally get this working on the command line:

gdal_rasterize.exe -a GID_1 -ot Byte -ts 3000 5000 -dialect SQLite -sql "SELECT CAST(substr(GID_1,5) AS INTEGER)*10+15 AS GID_1,* FROM gadm36_ARG_1" gadm36_ARG_1.shp arg.tif


But then when I tried to transfer those options to GDAL.jl, I got this:

ERROR: LoadError: GDALError (CE_Failure, code 6):
The SQLite driver needs to be compiled to support the SQLite SQL dialect


So I had to use ogrinfo to add a numerical index FID to the SQL table of the shapefile, and after that it was easy to burn that into the raster using the -a FID option.

Would it be possible to compile the SQLite driver into GDAL.jl? Another thing that would be useful would be if there were a simple way to access the command line versions of gdal_rasterize, ogrinfo, ogr2ogr, etc. I had to copy a bunch of lib*.dll files into my $HOME\.julia\packages\GDAL\vec6Y\deps\usr\bin folder to get those commands working. Again, this would all have been easy were it not for the fact that my demo shapefile happened to lack that index that most other shapefiles have. Bad luck I guess, but I learned a lot in the process. 1 Like About the GMT install problem I mentioned: During installation of the x64 version of gmt-5.4.4 I got an error message along the lines of “Path too long, could not update the PATH variable”. This happened even when I tried to install to C:\gmt. But the install completed anyway, so I installed GMT.jl in Julia 1.0 and tried your example: julia> using GMT julia> coast(DCW="AR+p0.25p+gblue", frame=:a) ERROR: error compiling #coast#120: error compiling finish_PS_module: error compiling showfig: error compiling gmt: could not load library "gmt_w64" The specified module could not be found.  This seems related to that path error. Do you think I can just update my system path variable manually to get this working? Do you know what folders I need to add? Another thing that would be useful would be if there were a simple way to access the command line versions of gdal_rasterize , ogrinfo , ogr2ogr , etc. I had to copy a bunch of lib*.dll files into my $HOME\.julia\packages\GDAL\vec6Y\deps\usr\bin folder to get those commands working.

1 Like

That is an unfortunate Windows issue, apparently when user paths are already long. I think you manage to solve it by manually editing the path and the gmt\bin directory. Then, it it works on command line (just type psxy - on the cmd shell) it will work on GMT.jl too.

You already have it in the gdal that is installed with GMT, … but GDAL.jl does not know how to find it.

This is an excellent thead btw, thanks!

I have myself been annoyed with ArchGDAL.jl (which I wrote) from time to time, in having to remember what the commands are for various things (GDAL is an enormous library), and looking them up in the tests. But it has grown friendlier with time, and is getting close to the point where it can easily support a wide range of other things.

A key thing has been for us to learn what the right abstractions are – which takes a lot of time (I’m trying to wrap up my PhD and am not a “GIS Expert”) and often involve long discussions (see e.g. [1], [2] and [3]). It helps everyone a lot to have more of such questions, alongside how they’re currently being done, be it

I really like the amount of effort you put into your questions, and how thoroughly you follow-up on them – these stuff does help various contributors a great deal in making progress, and I’m hoping there’ll be more to come!

3 Likes

Would it be possible to compile the SQLite driver into GDAL.jl?

I’ve opened an issue for it in GDALBuilder. Feel free to file future requests in that github repository.

Actually, there seems to be more to it. The fact that you didn’t get a warning about a missing GMT when you did using GMT means that it was installed.

On the other hand, the error message could not load library "gmt_w64" The specified module could not be found normally means that some dll dependency is missing or conflicting. I don’t get that error and that GMT5.4 installer has been used by many people so it’s difficult to me to solve that.

And it’s not over. This issue made me find that there a bug in GMT when pscoast -E is used and opened an issue in GMT to track it.

But it works with the GMT6dev version, so my recommendation is that you install the it (and previously remove GMT5.4) from here.

What @yeesian linked to about supporting the gdal utils in ArchGDAL.jl was about the library functions. Besides these, I also want the command line tools of GDAL from GDALBuilder to be working. If you could put it up as an issue at the GDALBuilder repo I’ll have a look at it. Ideally (if you remember) also listing what you had to do to get it to work. Thanks!

I’ll admit that I was disappointed when I started this thread and noticed that the Geo subforum had a grand total of just 7 threads since February 2017. So my expectations for getting help on a problem related to geospatial data were set very low. Even considering that, the level of engagement from the developers shown here is amazing. Maybe it’s just the “OMG OMG we actually have a USER!!” factor, but whatever the reason, thanks guys!

Done.

Maybe some progress: I uninstalled v5.4, installed GMT6dev and added GMT6\bin to my path. Now I get a different error when running ]test GMT or the Hello World example in the GMT docs:

6.0.0_r20419
ERROR: LoadError: GMT: Failure to open virtual file
Stacktrace:
[1] error(::String) at .\error.jl:33
[2] GMTJL_Set_Object(::Ptr{Nothing}, ::GMT.GMT_RESOURCE, ::Array{Float32,2}) at C:\Users\niclas\.julia\packages\GMT\SwD1H\src\gmt_main.jl:979
[3] gmt(::String, ::Array{Float32,2}) at C:\Users\niclas\.julia\packages\GMT\SwD1H\src\gmt_main.jl:262
[4] top-level scope at C:\Users\niclas\.julia\packages\GMT\SwD1H\test\runtests.jl:16
[5] include at .\boot.jl:317 [inlined]
[7] include(::Module, ::String) at .\sysimg.jl:29
[8] include(::String) at .\client.jl:388
[9] top-level scope at none:0
in expression starting at C:\Users\niclas\.julia\packages\GMT\SwD1H\test\runtests.jl:13
ERROR: Package GMT errored during testing


And when I try the DCW example in your first post above I get an error from pscoast that suggests I download DCW manually and install it to “DIR_DCW, the shared dir, or the user dir”. Where do you suggest I put it?

Honestly, I wasn’t enamored of the ArchGDAL syntax you posted above, probably because it triggered some latent PTSD from asyncronous nested callback hell in Node.js. Or maybe I just want things like drivers and C-pointers to be abstracted away. But I understand this is work in progress! I’m just glad I can get something to work right now. If I need nicer call syntax I’ll just wrap your code in a function.

In addition to Python and R you should look at Matlab syntax for shapefiles, it’s pretty straightforward. You use shapeinfo() to read the metadata into a standard Matlab struct, shaperead() to extract vector data or vec2mtx() to rasterize.

1 Like

Can we please move this discussion (the GMT working errors) into a GMT.jl issue? It will be more effective and no need to bother readers with two threads going on interlaced. Thanks.

One last question: is it possible to use GDAL/ArchGDAL to rasterize the shapefile directly to a Julia array, without saving a TIFF to disk? The “in memory raster” in the GDAL example hinted at this, but I couldn’t get it to work.

import GDAL
using ArchGDAL; const AG = ArchGDAL

raster = AG.registerdrivers() do
dest_ds = AG.Dataset(GDAL.rasterize(
Ptr{GDAL.GDALDatasetH}(C_NULL),
source_ds.ptr,
GDAL.rasterizeoptionsnew([
"-of", "MEM",
# "-l", "gadm36_ARG_1",   # The layer(s) used for input features.
"-ts", "250", "250",      # target size: "-ts xsize ysize". Set it to whatever you want.
"-ot", "Byte"
], C_NULL), C_NULL))
AG.destroy(dest_ds)
result
end
end


Honestly, I wasn’t enamored of the ArchGDAL syntax you posted above, probably because it triggered some latent PTSD from asyncronous nested callback hell in Node.js.

This also works:

import GDAL
using ArchGDAL; const AG = ArchGDAL

GDAL.allregister()
dest_ds = AG.Dataset(GDAL.rasterize(
Ptr{GDAL.GDALDatasetH}(C_NULL),
source_ds.ptr,
GDAL.rasterizeoptionsnew([
"-of", "MEM",
# "-l", "gadm36_ARG_1",   # The layer(s) used for input features.
"-ts", "250", "250",      # target size: "-ts xsize ysize". Set it to whatever you want.
"-ot", "Byte"
], C_NULL),
C_NULL
))

But it’s not obvious to explain which objects you need to call ArchGDAL.destroy() on and which objects you don’t. You can see some of the experiences people had from using GDAL’s python bindings (see [1] and [2]).
Thank you very much @yeesian, now everything works! And now that I understand the potential destroy() confusion the syntax feels more reasonable.