I work with atmospheric simulations and I use NetCDF files of the order of 80GB to 1TB.
I am always jelous at my friend from the office (@lsterzinger) that does everything super fast in his python with xarray, zarr, dask, and all his environment for out-of-core computations. Now, I know some attemps have been done at getting an xarray-like package to use but I had not used them.
I am asking your help increase the performance of my toy example.
- Before starting please note that having reproducible times is not easy because apparently there is some caching so that if you re run with the same file it will be much faster the second time. Not talking about compilation time but about data caching. If anyone has tips I would be glad to hear them. Therefore, please don’t pay that much attention to the specific timings. The take home message is that in its best moments Julia takes around 5x the best timing of Python script.
- Also note that the real dataset is much larger, don’t know the impact but having large files is an important part of the problem, so subsetting was needed to be able to share (performance worsens with large file).
- In this case I don’t think using BenchmarkTools is correct, I think it will suffer from data caching so more than one run may not be useful.
I for this example I have three toy data sets that can be downloaded here WARNING: 15GB of data:
speeds.nc NetCDF with two 2D variables, not chuncked
direct link: https://ucdavis.box.com/shared/static/yvy3hyvjli45pjroycyxjxs7y7vm61uw.nc
speeds_chunked.nc NetCDF with two 2D variables, chunks of 256,256,400
direct link: https://ucdavis.box.com/shared/static/0cvcwld3w9govhv5ijaa1e970c69x0xd.nc
speeds.zarr zarr with two 2D variables, chunks of 256,256,400
direct link: https://ucdavis.box.com/shared/static/4g9gtnfg3qsnx4z2f67yf8f76ikkdynh.tar
What I do (very inefficient):
using NCDatasets
function get_max_sp_ncdatasets(file)
usfc = Array{Float32,3}(undef,512,512,2400)
vsfc = Array{Float32,3}(undef,512,512,2400)
maxsp = Dataset(file) do ds
load!(ds["USFC"].var, usfc, :,:,:)
load!(ds["VSFC"].var,vsfc,:,:,:)
mapslices(maximum,sqrt.(usfc.^2 .+ vsfc.^2), dims = (1,2))
end
return maxsp[:]
end
file_unchunked = "./speeds.nc"
file_zarr = "./speeds.zarr"
file_chunked = "./speeds_chunked.nc"
@time get_max_sp_ncdatasets(file_unchunked)
#57.155043 seconds (3.79 M allocations: 7.236 GiB, 0.59% gc time, 4.82% compilation time)
@time get_max_sp_ncdatasets(file_chunked)
#16.300895 seconds (27.90 k allocations: 7.033 GiB, 0.33% gc time)
May things are wrong with this. In specific I need to load the whole data in memory and it is very slow. Let’s throw it out the window but note that chunking had a very good impact.
What my pythonist friend does:
import xarray as xr
import zarr as zr
def get_max_sp_xarray(file, zarr=False):
if zarr:
ds = xr.open_zarr(file)
else:
ds = xr.open_dataset(file)
usfc = ds.USFC # Does not compute anything. Instantaneous
vsfc = ds.VSFC # Does not compute anything. Instantaneous
sp = (usfc**2 + vsfc**2)**(0.5) # Does not compute anything. Instantaneous
return sp.max(dim=('x','y')).compute()
file_unchunked = './speeds.nc'
file_zarr = './speeds.zarr'
file_chunked = './speeds_chunked.nc'
%time get_max_sp_xarray(file_unchunked)
# CPU times: user 5.39 s, sys: 8.21 s, total: 13.6 s
#Wall time: 13.6 s
%time get_max_sp_xarray(file_chunked)
#CPU times: user 6.16 s, sys: 14.5 s, total: 20.7 s
#Wall time: 20.7 s
%time get_max_sp_xarray(file_zarr, zarr=True)
#CPU times: user 10.1 s, sys: 43.5 s, total: 53.5 s
#Wall time: 4.43 s
Faster, includes time to plot and the memory footprint is much lower and it even includes plotting! I think the example with zarr is the one to beat, although in my current project I do not have everything on zarr. Not sure why the unchunked version was slower, probably some pressure on the machine.
My attempt with YAXArrays.jl
YAXArrays is an xarray-like julia package.
using YAXArrays, NetCDF, Zarr
function get_max_sp_yaxarrays(file)
ds = open_dataset(file)
sp = map(ds["USFC"], ds["VSFC"]) do usfc, vsfc
sqrt(usfc^2 + vsfc^2)
end
maxsp = mapslices(maximum, sp, dims = ("x","y"))
return maxsp[:]
end
file_unchunked = "./speeds.nc"
file_zarr = "./speeds.zarr"
file_chunked = "./speeds_chunked.nc"
@time get_max_sp_yaxarrays(file_unchunked)
#172.770939 seconds (53.90 M allocations: 10.256 GiB, 1.46% gc time, 15.46% compilation time)
@time get_max_sp_yaxarrays(file_chunked)
#16.795537 seconds (818.22 k allocations: 9.506 GiB, 2.21% gc time, 2.03% compilation time)
#Warning: There are still cache misses
@time get_max_sp_yaxarrays(file_zarr)
#60.923438 seconds (24.11 M allocations: 21.867 GiB, 6.29% gc time, 23.03% compilation time)
#Warning: There are still cache misses
The version with zarr leaves much to be desired, the version with netcdf in the chunked data seems competitive. It still takes 4 times as long as the fastest Python version.
My attempts with Rasters.jl:
using Rasters
function get_max_sp_rasters(file)
vsfc = Raster(file_chunked, key="VSFC")
usfc = Raster(file_chunked, key="USFC")
sp = sqrt.(vsfc.^2 .+ usfc.^ 2)
maxsp = maximum(sp,dims=(X,Y))[:]
end
file_unchunked = "./speeds.nc"
file_zarr = "./speeds.zarr"
file_chunked = "./speeds_chunked.nc"
@time get_max_sp_rasters(file_unchunked)
# 18.926890 seconds (62.42 k allocations: 11.723 GiB, 5.58% gc time)
@time get_max_sp_rasters(file_chunked)
# 17.946154 seconds (62.42 k allocations: 11.723 GiB, 5.65% gc time)
@time get_max_sp_rasters(file_zarr)
# 18.267268 seconds (62.42 k allocations: 11.723 GiB, 5.75% gc time)
Interestingly the timing is very consistent across files. I would say the performance is very good (timing similar to the NCDatasets) but It cannot beat Python with Zarr.
My manifest (julia 1.7.1):
Status `/global/u2/a/aramreye/JuliaVsPythonReadingTests/Project.toml`
[13f3f980] CairoMakie v0.7.2
[85f8d34a] NCDatasets v0.11.9
[30363a11] NetCDF v0.11.4
[a3a2b9e3] Rasters v0.2.0
[b8865327] UnicodePlots v2.8.0
[c21b50f5] YAXArrays v0.1.4
[0a941bbe] Zarr v0.7.1