How to modify HDF5 dataset

bhawkins · March 24, 2020, 3:05am

I’m trying to write a program to read in an HDF5 file with a lot of complicated metadata, do some processing on a large dataset, and write the metadata and modified dataset back out to a new file. I want to avoid explicitly copying each piece of metadata or implementing some generic thing to copy each object (except the interesting dataset) one by one.

My first try was to copy the file, delete the dataset, and write a new one like so

using HDF5

function update_dataset1(src, dest, dataset)
    cp(src, dest, follow_symlinks=true, force=true)
    f = h5open(dest, "r+")
    d = f[dataset]
    newd = zeros(eltype(d), size(d)...)
    o_delete(d)
    write(f, dataset, newd)
end

But this basically doubles the file size because o_delete calls H5Ldelete which only deletes the reference to the dataset and not the actual object written to file (which becomes unreachable). I could write to a temporary file and shell out to h5repack it, I guess. I also tried the following

function update_dataset2(src, dest, dataset)
    cp(src, dest, follow_symlinks=true, force=true)
    f = h5open(dest, "r+")
    d = f[dataset]
    newd = zeros(eltype(d), size(d)...)
    d .= newd
end

But this dies with a MethodError not matching copyto! with the right types. I can actually do this in Python like so

def update_dataset3(src, dest, dataset):
    shutil.copy(src, dest)
    f = h5py.File(dest, "r+")
    d = f[dataset]
    newd = numpy.zeros_like(np.asarray(d))
    d[:,:] = newd

so I’m wondering if there’s just some interface in HDF5.jl that I’m missing.

bhawkins · March 24, 2020, 5:31am

I noticed in HDF5.jl that setindex! is defined, and so I tried

function update_dataset4(src, dest, dataset)
    cp(src, dest, follow_symlinks=true, force=true)
    f = h5open(dest, "r+")
    d = f[dataset]
    newd = zeros(eltype(d), size(d)...)
    d[:,:] = newd
end

and it does what I want.

So now I’m just wondering what’s the difference between x[:] = y and x .= y. I think the first calls setindex! while the other calls copyto!, but I don’t really understand why these are distinct methods.

Edit: Also, is there a shorthand for “all indices of all dimensions” like the ... in numpy?

Topic		Replies	Views
Update a variable in an HDF5 file General Usage hdf5	7	2305	November 30, 2021
View into HDF5 dataset Performance	1	1233	December 4, 2017
Saving and updating in-memory HDF5 files? Performance hdf5	3	279	April 2, 2024
How to delete dataset in hdf5 file ? GPT dont know! General Usage	1	254	March 30, 2024
HDF5: varying size dataset New to Julia question , hdf5	1	639	September 24, 2020

How to modify HDF5 dataset

Related topics