NCDatasets: performance issues deriving fast Minimum Maximum value as too slow

That’s not a very large array, so fully load it into memory first by doing Array(Data), and then do skipmissing on the result.

I can’t know exactly because you don’t provide the file.

But the problem is likely actually that NCDatasets.jl doesn’t properly implement the DiskArrays.jl implementation for CFVariable so it is calling the netcdf c library for literally every single pixel, rather than loading larger chunks. (@Alexander-Barth this is an example where making AbstractVariable a DiskArray has direct benefits to users - iteration and generators are chunked)

Either way for data this size, just load it into memory before use, it will always be faster.

One other way to speed it up immediately is to use Rasters.jl, which should be fast because it just loads to memory - but also will be faster from disk (via lazy=true) because it implements DiskArrays.

rast = Raster(Path; name=:SoilWater)
Pmin, Pmax = extrema(x for x ∈ skipmissing(rast) if !isnan(x))

(In a few weeks with the CF standars PR it will be even faster when you can intentionally not convert to missing and use the native missing value in skipmissing, which turned out to be much faster than what Base does with missing)

3 Likes