Is there a Julia function that returns h5 group structures à la h5dump?

I’m looking for a function that maps h5 data structures, i.e. variable paths, names and Types, and attribute paths, names and Types. This would be similar to h5dump.

See HDF5.jl.

I’ve looked through the documentation, it’s a fantastic library. But I don’t see any support for h5dump like functionality. The closest would be what’s displayed by show. There doesn’t seem to be anything that returns the full data tree path + attribute names. I could do this with HDF5.jl and number of nested loops and if statements but was wondering if this had already been done as I see this kind of functionality very useful.

Based on some old discussion threads about HDF5.jl, I believe the default in-REPL file display should show most of the information you’re looking for: Using HDF5.jl and reading h5 data to Julia? - #7 by stephancb

I think most of what i need is displayed by show but I guess there’s no function that returns a Dict or NamedTuple with that information and possibly options to return additional information like Types?

Oh, I was under the impression you just wanted printed output since that’s all h5dump does. I also didn’t find any functionality for obtaining an in-memory representation of overall file structure. It might be worth asking on GitHub or Slack about whether there’s some external library for this (or if it should be added as a feature request).

All of the information is there, you just need to loop over the contents. e.g.

f = h5open("test.h5", "r")

returns an object f that acts like a dictionary of names => datasets/groups, which you can iterate over like any other dictionary-like iterator (e.g. use keys(f) to get a list of names). If there is a dataset "array", then you can query its size with size(f["array"]) and its element type with eltype(f["array"]). You can loop over its attributes with attributes(f["array"]), and so forth.

Your original post asked for a function that returns data "structures "— this is what HDF5.jl provides. If you want h5dump-like pretty printing, you could add that on top of this. But HDF5.jl is more focused on making the information availalble programmatically, not as text. (If you just want text, why not use h5dump?)

So I’m looking for an in-memory dump of the full structure so that I can easily query the full h5 as oppose to needing to write a more complex loop structure to retrieve everything. I’m working with files that have vary complex structures with multiple subgroup levels.

My goal is to write a high level function that loads a subset an h5 file into memory with a structure that mimics the original h5. The user would supply variable and attribute names with optional dimension bounds. For my specific specific application I was going to build a datatree of the file format so I could then construct a structure to place the data into. But maybe I’m going about this wrong.

1 Like

f = h5open(...) already acts like a data tree. Why do you care how it is stored internally?

What kind of data structure do you want? A dictionary of dictionaries, for example, will be accessed much like the f object you get from h5open.

Thanks @stevengj, I think I was just approaching the problem the wrong way. Let me take a step back and reassess how I can get HDF5 to do what I want. Thanks for all the guidance.

I’m guessing this issue is related:

which subsequently led to another forum thread:

2 Likes