I’m trying to write a program to read in an HDF5 file with a lot of complicated metadata, do some processing on a large dataset, and write the metadata and modified dataset back out to a new file. I want to avoid explicitly copying each piece of metadata or implementing some generic thing to copy each object (except the interesting dataset) one by one.
My first try was to copy the file, delete the dataset, and write a new one like so
using HDF5 function update_dataset1(src, dest, dataset) cp(src, dest, follow_symlinks=true, force=true) f = h5open(dest, "r+") d = f[dataset] newd = zeros(eltype(d), size(d)...) o_delete(d) write(f, dataset, newd) end
But this basically doubles the file size because
H5Ldelete which only deletes the reference to the dataset and not the actual object written to file (which becomes unreachable). I could write to a temporary file and shell out to
h5repack it, I guess. I also tried the following
function update_dataset2(src, dest, dataset) cp(src, dest, follow_symlinks=true, force=true) f = h5open(dest, "r+") d = f[dataset] newd = zeros(eltype(d), size(d)...) d .= newd end
But this dies with a
MethodError not matching
copyto! with the right types. I can actually do this in Python like so
def update_dataset3(src, dest, dataset): shutil.copy(src, dest) f = h5py.File(dest, "r+") d = f[dataset] newd = numpy.zeros_like(np.asarray(d)) d[:,:] = newd
so I’m wondering if there’s just some interface in HDF5.jl that I’m missing.