HDF5 high-level wrapper for nested Dicts

I’m storing some data in hdf5 format to use in a multilanguage environment (others will use Python, R, C, etc. to access those files). I’ve implemented the export using HDF5.jl without much trouble, but it’s not very well organised (compared to R or Matlab versions of the same code). I would like to store objects to be saved in Dictionaries, to make it tidier. Unfortunately, I am unable to get HDF5.jl to save dictionaries, and the function I tried to write does not save anything.

Here’s some dummy data, organised as I would like it, and my attempts:

using HDF5

# simple objects
a  = collect(400:50:800)
Nl = length(a)

m = reshape(repeat(collect(1:900.0), Nl), (30,30,Nl))

# now a dictionary, which should map to a group called "g"
g = Dict("shape" => "spheroid", 
                "a" => 20.0,
                "c" => 40.0)

# another dictionary, which should map to a group called "e"
# this one is nested, should have 2 subgroups
e = Dict(
    "medium" => Dict("epsilon" => 1.0, "mu" => 1.0),
    "particle" => Dict("epsilon" => 1.0, "mu" => 1.0))

# ideally I'd group everything in a top-level Dict
# but that's optional
allthestuff = Dict("a" => a, "m" => m, "g" => g, "e" => e)

## ideal situation
# f = "debug.h5"
# fid = h5open(f, "w")
# save_dict_toh5(fid, allthestuff)
# close(fid)

## manual process (works, but messy)

f = "debug.h5"
fid = h5open(f, "w")
    
# assign simple datasets
fid["a"] = a
fid["m"] = m

# for flat Dicts, create a group, then iterate to assign datasets
group1 = create_group(fid, "g") 
for (key, value) in g
    g[key] = value
end

# for nested Dicts, this becomes a mess
group2 = create_group(fid, "e") 
for (key, value) in e
    # ridiculous way to handle nested group
    if(typeof(value) <: Dict)
        subgroup = create_group(fid, "e" * "/" * key) 
        for (subkey, subvalue) in value
            subgroup[subkey] = subvalue
        end
    else
    group2[key] = value
    end
end

# also store some attributes 
attributes(fid)["what's this"] = "some attributes"
attributes(fid["e"])["unit"] = "kg"

close(fid)

It works, but it’s verbose and inelegant. I’d like to write a high-level wrapper to make some steps more concise.

Here’s my attempt at writing a function to iterate over a (possibly-nested) Dict. It does nothing, unfortunately, which I assume is because it’s operating within the function’s environment instead of top level.

Many thanks.

function assign_dict(group, dict::Dict)

    for (key, value) in group
        if(typeof(value) <: Dict)
            assign_dict(group * "/" * key, value)
        else
        group[key] = value
        end
    end
    return group
end

# intended use within the above:
group2 = create_group(fid, "e") 
assign_dict(group2, e)

Note: I’m aware of the JLD2 package, but it seems to me that the output would be too Julia-centric for my use case, as it seems to encode objects to ensure that I/O maps well to Julia (e.g. structs), but those extra attributes would be undesirable when read from another language.

This seems to work for me,

function write_dict(groupname, dict::Dict)
    g = create_group(fid, groupname) 
    for (key, value) in dict
        keyname = groupname * "/" * key
        if typeof(value) <: Dict # nested, recurse
            write_dict(keyname, value)
        else
            write(fid, keyname, value)
        end
    end
end

Here’s my version of how I write this taking advantage of multiple dispatch.

julia> function write_dicts_to_hdf5_groups(
           parent::Union{HDF5.File, HDF5.Group},
           dict::Dict
       )
           for (key, value) in dict
               write_dicts_to_hdf5_groups(parent, key, value)
           end
       end
write_dicts_to_hdf5_groups (generic function with 3 methods)

julia> function write_dicts_to_hdf5_groups(
           parent::Union{HDF5.File, HDF5.Group},
           key::AbstractString,
           dict::Dict
       )
           group = create_group(parent, key)
           write_dicts_to_hdf5_groups(group, dict)
       end
write_dicts_to_hdf5_groups (generic function with 3 methods)

julia> function write_dicts_to_hdf5_groups(
           parent::Union{HDF5.File, HDF5.Group},
           key::AbstractString,
           value
       )
           write(parent, key, value)
       end
write_dicts_to_hdf5_groups (generic function with 3 methods)

julia> h5open("allthestuff.h5", "w") do h5f
           write_dicts_to_hdf5_groups(h5f, allthestuff)
       end

julia> run(`h5ls -r allthestuff.h5`)
/                        Group
/a                       Dataset {9}
/e                       Group
/e/medium                Group
/e/medium/epsilon        Dataset {SCALAR}
/e/medium/mu             Dataset {SCALAR}
/e/particle              Group
/e/particle/epsilon      Dataset {SCALAR}
/e/particle/mu           Dataset {SCALAR}
/g                       Group
/g/a                     Dataset {SCALAR}
/g/c                     Dataset {SCALAR}
/g/shape                 Dataset {SCALAR}
/m                       Dataset {9, 30, 30}
Process(`h5ls -r allthestuff.h5`, ProcessExited(0))

Another approach here would be to encode the dict as a compound datatype. Basically convert the Dict to a NamedTuple and then write that.

Dicts are hard to read and write since we do not know the type of the members up front. A NamedTuple is fully typed.

A big question is if there is a large array somewhere in the tree then the current approach would be better.

Thanks! I have no reason to use Dict rather than named tuples, so I’ll switch to that. It’s been a while since I wrote some Julia so I simply forgot about them. Your code looks much cleaner, thank you.

Edit: here’s my attempt at adapting the code for named tuples, if it’s useful for future visitors

using HDF5

function write_namedtuples_to_hdf5_groups(
    parent::Union{HDF5.File,HDF5.Group},
    tuple::NamedTuple
)
    for (key, value) in zip(keys(tuple), tuple)
        write_namedtuples_to_hdf5_groups(parent, key, value)
    end
end


function write_namedtuples_to_hdf5_groups(
    parent::Union{HDF5.File,HDF5.Group},
    key::Symbol,
    tuple::NamedTuple
)
    group = create_group(parent, string(key))
    write_namedtuples_to_hdf5_groups(group, tuple)
end


function write_namedtuples_to_hdf5_groups(
    parent::Union{HDF5.File,HDF5.Group},
    key::Symbol,
    value
)
    write(parent, string(key), value)
end