Saving and updating in-memory HDF5 files?

Hello!

Reading the documentation of HDF5.jl (Home · HDF5.jl) I see how it is possible to have an in-memory hdf5 file. I was wondering if;

  1. Can I update a dataset in an hdf in place?
  2. Can I save a copy of the current in-memory hdf5 file to disk?

I have a use case in which over N timesteps I have a fixed number of data points I want to save. I thought that HDF5 would be perfect for this, especially if one could do in-place.

Kind regards

You should first familiarize yourself with the upstream documentation.

https://docs.hdfgroup.org/hdf5/develop/_h5_f__u_g.html#subsubsec_file_alternate_drivers_mem

The example shows you how you can obtain a Vector{UInt8} of the file image. If you write that out to disk, you can open it just like a regular HDF5 file.

This is a highly specialized operation, and I’m not sure about your specific use case. Even for a HDF5 on disk, you can update a subsection of a dataset in-place.

Before we get into a XY problem situation, could you explain what you are trying to do or optimize?

Thanks!

What I want do in reality is saving datafiles at each output of my simulation. In my case since I know the number of data points before hand and that it never changes, I thought I could “preallocate” a HDF5 in memory, efficiently overwrite it in place, make a copy and save to disk. I thought by doing so that I could get an extra speed up / reduce allocations.

Kind regards

I found that a simple solution for fast file writing using HDF5.jl, without resorting to the complexity above, is to save all data into one single file. In pseudo-code:

function SaveHDF5!(fid::HDF5.File, group_name, variable_names, args...)
    create_group(fid, group_name)
    if !isnothing(args)
        for i in eachindex(args)
            arg           = args[i]
            var_name          = variable_names[i]
            fid[group_name][var_name] = arg
        end
    end
end

Where I pass in a consistent file id, update the group name and the variable input as needed. This would give me the following timings:

The reason for writing to one file is that surprisingly, atleast on Windows, close the HDF5.File is actually a bottle neck.

All in all, I am pretty pleased, writing to HDF5 in this way is about 10x faster than writing to individual .vtp files as I did in the past.

Just sharing my findings here, perhaps someone can benefit in the future.

1 Like