I wanted to write vectors of LabelledArrays to HDF5 files, preserving the labels, and be able to read these back and reconstruct the original data. My data is pretty much entirely numeric.
I’ve found it hard to find any relevant documentation; this is mentioned in an issue (#819) in HDF5.jl:
We are certainly lacking examples on writing compound data types in the documentation.
Indeed - the only mention at all is the (read and write) support for Complex
. But I found that you can write arrays of NamedTuple
and they end up being HDF5 array datasets with compound datatype as you’d expect. There are some restrictions, e.g. the fields can’t be strings.
Conversely, you can read such a dataset, even with a string field, and you get out a vector of NamedTuple
including string field.
So for example:
# File downloaded from https://www.neonscience.org/resources/learning-hub/tutorials/hdf5-intro-python
julia> fn = "/Users/patrick/Desktop/NEONDSTowerTemperatureData.hdf5";
julia> data = h5open(fn, "r") do h5f
read(h5f, "Domain_03/OSBS/min_1/boom_1/temperature")
end;
julia> typeof(data), size(data)
(Vector{NamedTuple{(:date, :numPts, :mean, :min, :max, :variance, :stdErr, :uncertainty), Tuple{String, Int32, Vararg{Float64, 6}}}}, (4323,))
But the reverse doesn’t work (unless the string-typed field, date
, is removed):
julia> h5open("test.h5", "w") do h5f
write_dataset(h5f, "test_dataset", data)
end
ERROR: ArgumentError: Could not convert non-bitstype NamedTuple{(:date, :numPts, :mean, :min, :max, :variance, :stdErr, :uncertainty), Tuple{String, Int32, Vararg{Float64, 6}}} to NamedTuple{(:date, :numPts, :mean, :min, :max, :variance, :stdErr, :uncertainty), Tuple{HDF5.FixedString{1, 0}, Int32, Vararg{Float64, 6}}} for writing to HDF5. Consider implementing `convert(::Type{NamedTuple{(:date, :numPts, :mean, :min, :max, :variance, :stdErr, :uncertainty), Tuple{HDF5.FixedString{1, 0}, Int32, Vararg{Float64, 6}}}}, ::NamedTuple{(:date, :numPts, :mean, :min, :max, :variance, :stdErr, :uncertainty), Tuple{String, Int32, Vararg{Float64, 6}}})`
So for my use case (all the fields will be numeric types, all Int
and Float
) the functionality for read and write is there, but I’m hesitant to use it given that it’s not documented.
From what I can tell looking at the PRs, adding compound datatype reading and writing functionality was a deliberate thing, but what I’m not clear on is whether the lack of documentation is just an oversight, or perhaps it’s because the functionality isn’t meant for external usage (maybe because of the problem with writing strings?).
I’m hoping it’s the former and I can just start using this as-is (and contribute a documentation PR). Does anyone happen to know?
(Edit: cross-referenced this in a comment to the above-mentioned issue).