I’m trying to write a program to read in an HDF5 file with a lot of complicated metadata, do some processing on a large dataset, and write the metadata and modified dataset back out to a new file. I want to avoid explicitly copying each piece of metadata or implementing some generic thing to copy each object (except the interesting dataset) one by one.
My first try was to copy the file, delete the dataset, and write a new one like so
using HDF5
function update_dataset1(src, dest, dataset)
cp(src, dest, follow_symlinks=true, force=true)
f = h5open(dest, "r+")
d = f[dataset]
newd = zeros(eltype(d), size(d)...)
o_delete(d)
write(f, dataset, newd)
end
But this basically doubles the file size because o_delete
calls H5Ldelete
which only deletes the reference to the dataset and not the actual object written to file (which becomes unreachable). I could write to a temporary file and shell out to h5repack
it, I guess. I also tried the following
function update_dataset2(src, dest, dataset)
cp(src, dest, follow_symlinks=true, force=true)
f = h5open(dest, "r+")
d = f[dataset]
newd = zeros(eltype(d), size(d)...)
d .= newd
end
But this dies with a MethodError
not matching copyto!
with the right types. I can actually do this in Python like so
def update_dataset3(src, dest, dataset):
shutil.copy(src, dest)
f = h5py.File(dest, "r+")
d = f[dataset]
newd = numpy.zeros_like(np.asarray(d))
d[:,:] = newd
so I’m wondering if there’s just some interface in HDF5.jl that I’m missing.