Datatype when save matrices in HDF5 file

Hi,

I am trying to write a HDF5 file in Julia. Now suppose I want to create a dataset mydataset which contains a matrix A in the group mygroup, what datatype should I use in the function create_dataset? (In the document, the function looks like create_dataset(parent, path, datatype, dataspace; properties…)).

Also, suppose I want to save an object which is a list of matrices [A1,A2,A3,A4], what datatype should I use?

Finally, if A is a static array. Should I transform it to a normal array before saving it to the HDF5 file? I want to make sure that I can read the dataset in python.

Thanks

Hello!

Usually you don’t need to call create_dataset directly, you can simply dump your matrix into the HDF5.File using indexing:

julia> using HDF5

julia> h5 = h5open("/tmp/x.h5", "w")
🗂️ HDF5.File: (read-write) /tmp/x.h5

julia> h5["m"] = rand(2,3,4)
2×3×4 Array{Float64, 3}:
[:, :, 1] =
 0.636573  0.539704  0.134521
 0.705768  0.795797  0.186137

[:, :, 2] =
 0.359359   0.0768516  0.542245
 0.0529602  0.882116   0.475718

[:, :, 3] =
 0.698961  0.992394  0.198544
 0.235275  0.455456  0.88217

[:, :, 4] =
 0.515136  0.495796  0.0160039
 0.692038  0.356454  0.915089

julia> size(h5["m"])
(2, 3, 4)

This also answers your second question: you cannot serialize a vector of matrices directly, but if those matrices have the same size, then you can merge them all together in the third dimension, and store them as a single 3D tensor. Or if that doesn’t work, you need to manually assign subkeys to your arrays/matrices:

for (i, A) in enumerate([A1, A2, A3, A4])
    h5["A/$i"] = A
end

As for the third question, I have no experience with static arrays, but I would assume whatever conversion is necessary, HDF5 takes care of it for you. That being said, it should be fairly straightworward to try it out.

HTH.

4 Likes

Hi David. Thanks a lot for the help. I will play with what you said.

1 Like