Reading the documentation of HDF5.jl (Home · HDF5.jl) I see how it is possible to have an in-memory hdf5 file. I was wondering if;
Can I update a dataset in an hdf in place?
Can I save a copy of the current in-memory hdf5 file to disk?
I have a use case in which over N timesteps I have a fixed number of data points I want to save. I thought that HDF5 would be perfect for this, especially if one could do in-place.
The example shows you how you can obtain a Vector{UInt8} of the file image. If you write that out to disk, you can open it just like a regular HDF5 file.
This is a highly specialized operation, and I’m not sure about your specific use case. Even for a HDF5 on disk, you can update a subsection of a dataset in-place.
Before we get into a XY problem situation, could you explain what you are trying to do or optimize?
What I want do in reality is saving datafiles at each output of my simulation. In my case since I know the number of data points before hand and that it never changes, I thought I could “preallocate” a HDF5 in memory, efficiently overwrite it in place, make a copy and save to disk. I thought by doing so that I could get an extra speed up / reduce allocations.
I found that a simple solution for fast file writing using HDF5.jl, without resorting to the complexity above, is to save all data into one single file. In pseudo-code:
function SaveHDF5!(fid::HDF5.File, group_name, variable_names, args...)
create_group(fid, group_name)
if !isnothing(args)
for i in eachindex(args)
arg = args[i]
var_name = variable_names[i]
fid[group_name][var_name] = arg
end
end
end
Where I pass in a consistent file id, update the group name and the variable input as needed. This would give me the following timings: