How to resample (downscale) a one-hot encoded (boolean) raster?

I have “one-hot” encoded a 1x1km soil category raster, so I have a raster representing a specific soil class, 0 when is not in the class, and 1 otherwise.

I would now downscale it to match my actual other data at 8x8km, not necessarily aligned grid, and I would like to have in this downscaled layer the average of the values in the original 1x1km grids, possibly weighted by the area to consider alignment issues, so to have the ratio of the soil class in the large pixel.
For example if in the 1x1 km pixels 16 are ones and 48 are zeros I would like to have 0.25.
I thought for that to use the function resample with method=:average but I am left with all zeros or ones:

using Rasters, Downloads
import ArchGDAL

Yfilename ="")
Yraster = Raster(Yfilename)

soil_filename ="")

soil = Raster(soil_filename) # 1km grid

# Getting 1/0 values for a specific class 
(nC,nR)  = size(soil)
soil_cl12 = deepcopy(soil)
for c in 1:nC
    for r in 1:nR
        soil[c,r] == soil.missingval && continue
        soil[c,r] == 12 ? soil_cl12[c,r] = 1 : soil_cl12[c,r] =  0

soil_cl12_eu = resample(soil_cl12,to=Yraster,method=:average)
write("soil_orig.tif", soil_cl12, force=true)

The issues seems to be that soil_cl12_lr is a integer raster so it doesn’t accept floats.
Is this the issue? Is it possible to convert a raster of integers to one of floats ?

As a workaround I have changed the one-hot encoder and use 1000 as a value instead of 1, but then of course I have no longer ratios…

I tried to run your example but I can’t download the Yraster because I get a time out error.

Could you copy the full error that you get so that we can help you?

wait, I put it on my server too…

I updated the script to download it from my server.

I don’t have an error, just that the output instead of having a float (e.g. 0.25 in the example above) has only integers 0/1, while when we compute the average we should get a float as result.

Instead of deepcopying the data that you than do not need anyways you can use similar to generate a new dataset with the same size of soil but with a Float32 eltype. But the best way to deal with that is to use a map instead of doing the for loop yourself.
Then the map sees, that the result should be Float32 and then it is used from there.

using Rasters, Downloads
using ArchGDAL
Yfilename ="")
Yraster = Raster(Yfilename)

soil_filename ="")

soil = Raster(soil_filename) # 1km grid

soil_cl12 = map(x-> x==12 ? 1.0f0 : 0.0f0, soil)

soil_cl12_lr = resample(soil_cl12,to=Yraster,method=:average)

I am not sure, why the missing data is not properly handled but at least you get values between 0 and 1.

1 Like

That’s because everything that is not 12, including the missing value, becomes zero.
By adding a control for it it works fine:

missval       = soil.missingval
missval_float = Float32(soil.missingval)
soil_cl12 = map(x-> ( (x == missval) ? missval_float : (x == 12 ? 1.0f0 : 0.0f0 ) ), soil)

It is still a bit of mystery to me how the map is able to return a whole Raster object rather than a matrix, but that’s great. Thank you very much!

1 Like

Julia gives us pretty good options to keep the raster properties through most functions that can be applied to an array - like similar, which a low level method underneath a lot of array allocations, which we add methods for in DimensionalData.jl so any AbstractDimrnsionalArray (like a Raster) will keep its properties whenever similar is called.

Also, for correctness missingval(raster) is better than raster.missingval, as field access is an internal detail, and for e.g. a RasterStack it does something quite different.