NCDatasets: performance issues deriving fast Minimum Maximum value as too slow

Raf · August 21, 2024, 11:06pm

That’s not a very large array, so fully load it into memory first by doing Array(Data), and then do skipmissing on the result.

I can’t know exactly because you don’t provide the file.

But the problem is likely actually that NCDatasets.jl doesn’t properly implement the DiskArrays.jl implementation for CFVariable so it is calling the netcdf c library for literally every single pixel, rather than loading larger chunks. (@Alexander-Barth this is an example where making AbstractVariable a DiskArray has direct benefits to users - iteration and generators are chunked)

Either way for data this size, just load it into memory before use, it will always be faster.

One other way to speed it up immediately is to use Rasters.jl, which should be fast because it just loads to memory - but also will be faster from disk (via lazy=true) because it implements DiskArrays.

rast = Raster(Path; name=:SoilWater)
Pmin, Pmax = extrema(x for x ∈ skipmissing(rast) if !isnan(x))

(In a few weeks with the CF standars PR it will be even faster when you can intentionally not convert to missing and use the native missing value in skipmissing, which turned out to be much faster than what Base does with missing)

Topic		Replies	Views
Help me beat my pythonist friend's code. Speeding up data reading with simple reduction from NetCDF file Data	57	5397	April 12, 2022
min(NaN,5) = NaN (why no @fastmath?) General Usage nan	47	1912	February 2, 2023
Visualising NetCDF using Raster.jl Geo question , plotting	10	1025	August 16, 2022
There is a Problem with Minimum function, can any one help me? General Usage	5	899	August 21, 2024
NetCDF file created in Matlab is read differently by Julia? General Usage question	5	504	October 4, 2021

NCDatasets: performance issues deriving fast Minimum Maximum value as too slow

Related topics