Why JLD2.jl is 40x slower than Arrow.jl

Just checked HDF5.jl for comparison.

The results compared to JLD2:

HDF5 has faster append speed with big files (2-5x faster), slower when files are smaller (2-5x slower).
HDF5 full read is 10x slower. This is quite bad.
File size is 2x of what JLD2 generates.

FYI: Advanced Usage · Julia Data Format
(but it probably won’t help much here)

JLD2 needs to parse some contents of a file before being able to append something. So, if you first read and then append, then you can save time by keeping the file open. (Open with r+ right away)

f = jldopen(fn, "r+")
try
    read_data = JLD2.loadnesteddict(f)

    # append
    
finally
    close(f)
end

1 Like

Also, in your use-case, JLD2 and HDF5 should be compatible e.g. write files with JLD2 and read (edit?) with HDF5. (The reverse: writing with HDF5 and reading with JLD2 is also possible but not editing.)

No, I am fine with the narrowest compatibility, caches can be regenerated anywhere anytime, just be efficient.