Thank you @tamasgal,
reporting bad experiences is valuable to me, so we can try to stay away from those patterns.
I agree with you that, right now, JLD2s purpose is internal / short-medium term storage and I would not recommend publishing data in that format.
I agree, but in my opinion this is largely a question of tooling rather than a problem with the library/spec itself. On the contrary, I think with the HDF5.Group
structure you can quite easily represent nested structures.
About JLD2:
JLD2 implements a subset of the HDF5 spec but with additional julia-specific features for encoding type information.
Nonetheless, JLD2 files are already valid HDF5 files.
julia> using JLD2, HDF5
julia> numbers = rand(5)
5-element Array{Float64,1}:
0.35527512738045885
0.31367145115382966
0.49535895899812266
0.4038095087610356
0.4229428296143374
julia> hello = "world"
"world"
julia> @save "test.jld2" numbers hello
julia> f = h5open("test.jld2", "r")
HDF5 data file: test.jld2
julia> names(f)
2-element Array{String,1}:
"hello"
"numbers"
julia> read(f, "hello")
5-element Array{String,1}:
"world"
"\x7f"
"\xbd\x03"
""
""
julia> read(f, "numbers")
5-element Array{Float64,1}:
0.35527512738045885
0.31367145115382966
0.49535895899812266
0.4038095087610356
0.4229428296143374
~ h5dump test.jld2
HDF5 "test.jld2" {
GROUP "/" {
DATASET "hello" {
DATATYPE H5T_STRING {
STRSIZE 5;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 5 ) / ( 5 ) }
DATA {
(0): "world", "\000\000\000\000\000", "\000\000\000\000\000",
(3): "\000\000\000\000\000", "\000\000\000\000A"
}
}
DATASET "numbers" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 5 ) / ( 5 ) }
DATA {
(0): 0.355275, 0.313671, 0.495359, 0.40381, 0.422943
}
}
}
}
Of course, as you can see with the string, at the moment you can retrieve the data but it is not fully straight forward.
When julia structs get involved it gets a bit more complicated.
julia> struct S; x::Int; y::Float64; z::String; end
julia> s = S(1, 2.0, "3")
S(1, 2.0, "3")
julia> @save "test2.jld2" s
~ h5dump test2.jld2 So 19 Jul 2020 12:36:50 CEST
HDF5 "test2.jld2" {
GROUP "/" {
GROUP "_types" {
DATATYPE "00000001" H5T_COMPOUND {
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "name";
H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} "parameters";
}
ATTRIBUTE "julia_type" {
DATATYPE "/_types/00000001"
DATASPACE SCALAR
DATA {
(0): {
"Core.DataType",
()
}
}
}
DATATYPE "00000002" H5T_COMPOUND {
H5T_STD_I64LE "x";
H5T_IEEE_F64LE "y";
H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
} "z";
}
ATTRIBUTE "julia_type" {
DATATYPE "/_types/00000001"
DATASPACE SCALAR
DATA {
(0): {
"Main.S",
()
}
}
}
}
DATASET "s" {
DATATYPE "/_types/00000002"
DATASPACE SCALAR
DATA {
(0): {
1,
2,
"3"
}
}
}
}
}
In a way this stuff is still self-descriptive. At least JLD2 can typically reconstruct types that are not defined in the current session but in a different language this could be a major undertaking.
So my dream for the future would be to
- Improve JLD2 for julia-internal usage (there are plenty of outstanding issues)
- Build and combine existing tooling to make it easy to produce hdf5 compatible structures for long-term storage and publication. (e.g. unrolling nested structures into groups )
Long term dream
3) provide a more fully featured hdf5 implementation to also be able to read hdf5 files not produced with JLD2