Why is NPZ so much faster than JLD in this case?

Michael_Hess · May 13, 2023, 1:35am

Hello! First post here so sorry in advance :')

I am doing numerical simulations on a large dataset for a wide range of parameters, and I was trying to save the results of this analysis using the JLD save function. The array was around 12GB when written to disk, but it took JLD more than 2 hours to save it, whereas when using NPZ’s npzwrite function the save took only a couple of minutes. JLD may have even taken longer, but I eventually just canceled the job since it was a ridiculous amount of time. I needed to include missing values in the array since some (the majority of) simulations simply don’t work, and I was also using NaN values to signify another specific point of failure in the array. Is the issue just that the array is of type Union{missing, Float64} and has many NaN values? If so, I am shocked that JLD cannot handle this situation and took so long to save a fairly simple array.
Thanks in advance!

goerz · May 13, 2023, 1:50am

I don’t know anything about JLD vs NPZ, but your array should not be of type Union{Missing, Float64}. A NaN is a valid floating point number, so your eltype should just be Float64

Michael_Hess · May 13, 2023, 2:42am

I do need both NaN and missing values separately. The missing values signify that the simulation did not work at all, while the NaN values tell me a specific function (after the simulation) did not work. So, I will eventually go back and try to fix the cases where NaNs arise, but I don’t want to even try to work on the missing values. So, I realize that NaN values count as Float64, but I also need the missing values to be there.

mkitti · May 13, 2023, 3:06am

Note that JLD is using HDF5 under the hood. Your answer lies in the underlying representation of the data. I’m not sure what JLD does with a Union{Missing, Float64}.

Another question is if JLD is trying to use compression.

Topic		Replies	Views
[ANN] JLD2 v0.2.0 Package Announcements package , announcement	9	2341	September 7, 2020
JLD.jl vs JLD2.jl General Usage	23	8713	October 30, 2018
How to save a large Float32 array on disk using data compression (failed attempt with JLD2)? Data jld2 , data-compression	2	1276	January 24, 2023
Can't read old JLD2 file Tooling	17	2941	February 19, 2019
How to optimaly save in JLD or HDF5 many Any arrays General Usage hdf5	0	649	January 18, 2017

Why is NPZ so much faster than JLD in this case?

Related topics