HDF5 has faster append speed with big files (2-5x faster), slower when files are smaller (2-5x slower).
HDF5 full read is 10x slower. This is quite bad.
File size is 2x of what JLD2 generates.
JLD2 needs to parse some contents of a file before being able to append something. So, if you first read and then append, then you can save time by keeping the file open. (Open with r+ right away)
f = jldopen(fn, "r+")
try
read_data = JLD2.loadnesteddict(f)
# append
finally
close(f)
end
Also, in your use-case, JLD2 and HDF5 should be compatible e.g. write files with JLD2 and read (edit?) with HDF5. (The reverse: writing with HDF5 and reading with JLD2 is also possible but not editing.)
this post has made me curious, so I did some tests on my rather plain data (all DataFrames).
The numbers > 1.0 in the three rightmost columns are all in favour of Arrow.jl in my case.
This was measured using @elapsed. Filesize is as reported by stat so the largest JLD2 file is 3.7GB.
I note that the identifier column is intentionally partially hidden.