Help me with Julia/R comparison project, how to plot US county maps in Plots/StatsPlots (or what's best alternative)?

I’m working with Phil Price over at Andrew Gelman’s blog https://statmodeling.stat.columbia.edu/ to do a comparison between using R and using Julia for basic data analysis stuff.

We’ve got a few simple tasks and are implementing them in both languages to show comparisons. In fact, it started because I expressed frustration with the Tidyverse and its reliance on Fexprs but it’s probably turning out to show essentially that for simple things both ecosystems let you do the simple stuff relatively easily and without fuss… which is fine. I’m OK with just “Julia is very usable if you’re familiar with R”. There is one example simulation based task where it’ll highlight the speed differences, and the existence of good general purpose computing tools such as data structures (the Julia implementation will use a “Set” whereas probably R would be done with vectors and a lot of copying)

However, R has a lot more time to develop specialized tools for visualizations. One task is to visualize some COVID data from the CDC. In R Phil is plotting a map of the US with each county colored. I don’t know how to do that in StatsPlots. Is there a simple way? Yes I know that Julia has other plotting tools, but I don’t want to complexify the comparison by highlighting the fact that there are like 5 viable plotting libraries (Plots/StatsPlots, VegaLite, Makie, Gadfly, PGFPlots, etc etc)

So, how can I plot the counties of the US on a map using StatsPlots/Plots ? If this is not directly possible, what is the primary 1 or 2 tools that would be recommended to plot very simple map plots of the US such as this one that Phil generated (showing that the dataset we were looking at had mostly missing data, grey, and we needed to find a better data source)

2 Likes

Would you accept a solution in any of these packages? I assume you wouldn’t have to present all of them.

I guess for the purposes of this thread I’ll accept any and all solutions, and then will try to choose the one that seems to provide the least resistance to present it :wink:

2 Likes

Maybe this is helpful

https://www.queryverse.org/VegaLite.jl/stable/examples/examples_maps/

2 Likes

Fyi, a GMT solution for this problem is provided here.

2 Likes

Nice the GMT version seems simple enough and uses a “plot” method which seems straightforward as an interface. the vegalite interface is fine for someone who’s familiar with vegalite and Julia and etc, but will I think be off-putting as an introduction to mapping on Julia.

One question: where do I get the shapefile for the US counties?

I guess to maybe answer my own question: would this be what I’m looking for? https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip from Cartographic Boundary Files - Shapefile

That zip file contains:

Archive:  cb_2018_us_county_5m.zip
  inflating: cb_2018_us_county_5m.shp.ea.iso.xml  
  inflating: cb_2018_us_county_5m.shp.iso.xml  
  inflating: cb_2018_us_county_5m.shp  
  inflating: cb_2018_us_county_5m.shx  
  inflating: cb_2018_us_county_5m.dbf  
  inflating: cb_2018_us_county_5m.prj  
 extracting: cb_2018_us_county_5m.cpg  

which since I’m not familiar with shapefiles leaves me with question how to actually use it?

Installing GMT.jl results in:

Failed to precompile GMT [5752ebe1-31b9-557e-87aa-f909b540aa54] to /var/local/dlakelan/dotjulia/compiled/v1.7/GMT/jl_6mcLAj.
ERROR: Caught signal number 11 (Segmentation fault) at
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x25)[0x7f0af0c2c945]
[0x557263ef6588]
Stack backtrace:
/lib/x86_64-linux-gnu/libgmt.so.6(sig_handler_unix+0xf4)[0x7f0af0e9b344]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c910)[0x7f0af0bde910]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x25)[0x7f0af0c2c945]
/lib/x86_64-linux-gnu/libproj.so.22(_ZN5osgeo4proj6common13UnitOfMeasureD1Ev+0x6f)[0x7f0aec85197f]
/lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xc6)[0x7f0af0be1556]
/lib/x86_64-linux-gnu/libproj.so.19(+0xb8083)[0x7f0ae8c87083]
    Building Conda → `/var/local/dlakelan/dotjulia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/6cdc8832ba11c7695f494c9d9a1c31e90959ce0f/build.log`
    Building GMT ──→ `/var/local/dlakelan/dotjulia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/d1068159f18828ec1831efd34f1880f84a792b98/build.log`
ERROR: Caught signal number 11 (Segmentation fault) at
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x25)[0x7f5df1963945]
[0x555d3716e552]
Stack backtrace:
/lib/x86_64-linux-gnu/libgmt.so.6(sig_handler_unix+0xf4)[0x7f5df1bd2344]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c910)[0x7f5df1915910]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x25)[0x7f5df1963945]
/lib/x86_64-linux-gnu/libproj.so.22(_ZN5osgeo4proj6common13UnitOfMeasureD1Ev+0x6f)[0x7f5ded58897f]
/lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xc6)[0x7f5df1918556]
/lib/x86_64-linux-gnu/libproj.so.19(+0xb8083)[0x7f5de99be083]
ERROR: LoadError: failed process: Process(`gmt --show-library`, ProcessSignaled(11)) [0]

when I run gmt --show-library it gives the same segfault/backtrace. This is using the current Debian/testing version of GMT, anyone else having this issue?

There is a related thread here concerning GMT installation issues in linux, in case it helps.

1 Like

I reported a bug to Debian and they helped me figure out what was wrong (another package was keeping an older version of libproj installed) after appropriate upgrades and removing the old library gmt works on the command line, so I’ll make some effort to see if I can do something with the GMT.jl solution thanks much!

@rafael.guerra I have the shape file from the above mentioned county information, and would like to just see a map of all counties. The linked example produces the Choropleth after manipulating various things, but for just a first peek, I tried:

countymap = gmtread("data/cb_2018_us_county_5m.shp")
GMT.plot(countymap)

and get nothing. Can you offer suggestion as to how to just see outlines of all the counties, or something similar?

1 Like

@dlakelan Glad you were able to figure out the installing problem because I was clueless.

Regarding the plots, you must first understand that GMT works a bit differently from the other plotting packages (but, don’t they always do among each other?). In GMT, the plotting command adds a layer to the PostScript file. This means we never have the entire plot in memory but that we keep adding layers at need. We only display the figure when it’s finalized, and we do that by using the option show=true. The intermediate layers are added by using the ! notation. i.e.
plot(...); plot!(...); plot!(..., show=true)
since you didn’t provided the show option there is nothing to be … shown.

I tried that shapefile and there is something wrong somewhere (likely in the GMT.jl side) when I try to plot it all because many polygons are not shown, but it worked nicely whem imposing a map limit

countymap = gmtread("cb_2018_us_county_5m/cb_2018_us_county_5m.shp");
plot(countymap, region=(-125,-65,20,50), proj=:guess, show=true)

And if you want to follow that example with the Portuguese COVID map, I have to say that I should have updated it while I remembered better how to. Since then an attribute field has been added to the GMTdataset type and the function make_zvals_vec() was greatly improved to allow fetching polygons by attribute fields.

2 Likes

Great. Thanks for this help. I’d be interested in adding a GMT related tutorial to my archive of data analysis tutorials, both so I can learn more about mapping with GMT, and so others can benefit. Would you be interested in collaborating on that?

2 Likes

Well, yes. Tutorials and examples is a lagging area in GMT.jl

1 Like

Well, yes. Tutorials and examples is a lagging area in GMT.jl

Yes please. GMT and GMT.jl are great tools for mapping, but I do find them very complex. I know you can really do amazing plots and analysis with them, but as a novice they feel overwhelming. I’ve used them, would love to learn more, and tutorials would be a great tool for that.

1 Like

Great, I’ll PM you, and anyone who wants an invite to the discussion PM me and I’ll add you to the discussion. @alejandromerchan do you want to be on that PM?

I did some mapping in Julia a while aga, on a smaller scale (California counties) using VegaLite. Just like you, I had experience with R and I used a project I had as an excuse to learn more Julia. I haven’t done mapping since, so I don’t know how much things have changed. You’re welcome to loop me in, I gladly help if I can.

1 Like

There is this example I made a while ago using Plots.jl to plot US counties if that is interesting:

I also have a Makie example I think somewhere if that is also of interest!
Makie.jl was just a tad more complicated to get built than Plots’s version.

Did you use GeoMakie.jl? I haven’t used it but I know there’s an effort on developing that tool.

I did not!
I actually just used Makie.jl as GeoMakie.jl was not where I needed it to be.
I can see if I can find my example using a Shapefile plus Makie if that would be of interest @alejandromerchan :slight_smile:

1 Like

Hi all, I was just reminded of this thread because of a notification. Here is a link to the github where I added a bunch of mapping examples to the GMTMaps.jmd file GitHub - dlakelan/JuliaDataTutorials: Tutorials For Data Analysis in Julia

I don’t remember exactly what the status was because last updated 7 months ago, but I wanted to let people easily find the mapping examples. Enjoy

1 Like

@dlakelan I took the liberty to add your nice examples to the GMT.jl manual

2 Likes