Appending an element to a JLD2 file

rdeits · October 13, 2017, 3:54pm

I’m curious what the right way to append elements to a vector stored in a JLD2 file, assuming that I want to save the data to disk every time I add an element (because each element is expensive to compute).

The first thing I tried was:

jldopen("test.jld2", "a+") do file
    file["x"] = []
end

for i in 1:10
    jldopen("test.jld2", "a+") do file
        push!(file["x"], 1)
    end
end

but that actually results in the file being empty, because doing file["x"] gives me a copy of the data, and appending to that copy does nothing to the data on disk.

So, instead, I could do:

results = []
for i in 1:10 
    push!(results, 1)
    jldopen("test.jld2", "w") do file
        file["x"] = results
    end
end

but this will write the entire results vector to disk at every iteration, which seems wasteful.

Am I missing something obvious? Is there a better way?

ggggggggg · October 13, 2017, 4:31pm

HDF5 (on which JLD is based) is not very efficient with small writes, and by default leaves you open to losing data if your program crashes before flushing. The new single writer multiple reader (SWMR) feature allows you to flush a specific dataset, and I believe it syncs the dataset and metadata to disk. I think if you use SWMR, or if you just flush the file after every write, you will be safe.

Use HDF5.jl directly to create a dataset with extendible dimensions (search for extendible), and push to that. The SWMR API is not in the HDF5.jl docs, but the swmr.jl test file shows the usage.

ggggggggg · October 13, 2017, 4:34pm

Also both of you examples show opening and closing the JLD file on each iteration, which is expensive. You might consider writing to a flat binary file on each iteration as a backup, and just writing the complete results in JLD at the end. Or

jldopen("test.jld2", "w") do file
results = []
    for i in 1:10 
        push!(results, 1)
        file["x$i"] = results[i]
    end
end

rdeits · October 13, 2017, 8:06pm

Ok, makes sense. Thanks!

rdeits · October 20, 2017, 8:06pm

x-ref: HDF5 speed?