How to convert DataFrame to Raster?

I’m very new to Julia and coming from R background I’m now exploring spatial analysis in Julia. My current and maybe silly question is, how can I convert a dataframe that includes x and y coordinates to raster or raster stack? Are there any straightforward solutions similar to rast(…, type=”xyz”) in R’s terra package? I was able to convert a Raster object to DataFrame via using DimTable (don’t know if that is the best way), but have not figured out any easy solution how to do the conversion from DataFrame to Raster.

(Oops accidentally deleted this)

1 Like

Thanks! So the conversion from raster to dataframe is much more straightforward than I assumed – very nice feature. I was approaching this from R perspective and didn’t even bother to look at the rasterize to do the conversion from DataFrame to Raster, my bad. However, I’m still having problem with this. How to define correct keyword arguments when rasterizing DataFrame? I tried to read the docs for rasterize, but did not quite understand it. Here is an example:

using Rasters 
using RasterDataSources
using Plots
using DataFrames

## ENV["RASTERDATASOURCES_PATH"] = "/home/miikka/pCloudDrive/data/"

## Reading bioclim variables 1 and 12 as a raster stack 
A = read(RasterStack(WorldClim{BioClim}, (1, 12))) |> replace_missing

## converting to dataframe. Very straightforward indeed!
dfA = DataFrame(A)

## But how to convert the DataFrame back to a Raster? 
B = rasterize(dfA)

Hmm ok, it seems what you want (and what terra is doing) is auto-detecting the dimension extent and resolution from the columns? (that information is lost when you convert to a dataframe).

It would be good to have auto detection like that, but I haven’t written it. I have some questions to clarify what you want:

  • So that I understand your workflow a little better, why do you convert to a DataFrame and then back to a raster?
  • Have you subsetted the dataframe so rows are missing?
  • Have you modified column values?
  • Do you also intend to create a raster from a DataFrame with points that were never part of a raster to begin with?

The main question for me is what do you want the resolution of each dimension to be?. I am guessing terra is doing some work to calculate it from the dataframe columns. Do you know where it gets the resolution from and what it is in your example? Especially getting the exact resolution back could be tricky if there are a lot of missing rows. If you remove every second row would you want the resolution to be halved for one of the dimensions?

What does terra do if you use arbitrary points that were never in a Raster? what resolution do you get?

This should work with rasterize:

B = rasterize(dfA; to=A, fill=(:bio1, :bio12))

But you can see we still need to specify to=A for the extent and resolution, and specify which columns to use for data. More of that could be automated. This will also be slower than necessary for a huge raster as it is not assuming all points are exactly on the grid - it’s written to handle points that are not from a raster originally, and looks up every single point in the dimension index. The half second is surprisingly fast seeing it does an index search 4 million times.

(Also: if you have time to play with rast in terra and find out what it does with weird dataframes, that would be amazing. Like where there are many rows missing, or poiints that you never extracted from a dataframe - that don’t fit on any grid, have some some very close to each other or exactly duplicated points.

Knowing what to do in the edge cases is the main problem with doing things automatically.

If you do have time, please make a github issue for Rasters.jl detailing what it does with different kinds of dataframes, and we can try to replicate that.)

@MiikkaT as an alternative approach you can load GeoStats.jl and do georef(df, (:x, :y)). Is your data really a raster? Usually when you only have columns x and y in a table you have a point set (i.e. no grid topology).

I suggest reading the entire documentation once to learn about what is available:

https://juliaearth.github.io/GeoStats.jl/stable

In the example they make a dataframe from a raster, so it can be rebuilt into a raster.

The problem is making it slightly easier than that line of code I posted above using rasterize - which does what they want, but with more user input than terra rast needs.

@Raf Thanks, your rasterize example works perfectly! Although, like you said, it might be a bit too heavy duty solution for my problem.

In the workflow I refer above, after the raster to dataframe conversion, I would modify column values, add new columns that are functions of other columns and maybe even re-order rows. However, I would not add new rows or subset the data. X and Y values of a row also remain intact. So even if rows are rearranged, it is possible to convert dataframe back to raster with modified and/or added columns. Obviously it would not be possible if the grid is not regular. I would assume this is very straightforward for terra (and would also for Rasters), but if there would be irregularities, rasterize would be the only option in terra as well. Terra would throw an error, if you try to make a raster from a matrix/dataframe where coordinates would produce an irregular grid.

I inherited this workflow from R where it is sometimes nicer to work with dataframes rather than rasters or raster stacks directly, e.g., when predicting using some machine learning models. It might be that this is not necessary in Julia. But its good anyway that the raster-dataframe-raster conversion is easily available.

@juliohm Will definitely start to learn GeoStats.jl as well!

I’m sure I will keep bugging you both while familiarizing myself to spatial analysis in Julia!

From your description of column-based workflows consider this post:

https://discourse.julialang.org/t/ann-geostats-jl-v0-36/90708

There is a vídeo that you may find useful.

Thanks for the details! Does terra still work when you remove a lot of rows? It would be good to know exactly where the errors are thrown.

And yes for what you describe it might be easier to work with the Raster directly. Generally in Julia we use dataframes less than in R, because the other objects like AbstractArray and NamedTuple are so easy to use and work everywhere. Raster is just an AbstractArray and works most places an Array works.

And if you are going to bug people (please do) the best place is via github issues.

That is how features are added to Rasters.jl (and most julia packages really) - I only work on things I need personally or issues that are in the queue.