How can I read the names of variables/object stored/saved/written in a file using HDF5 in Julia

I have built a code that runs different experiments and automatically build folders for those experiments and inside them subfolders for for different parameters that I have used for the same experiment. Inside these subfolders I save .h5 file with the result of the experiments that could be one or more variables.

Now I would like to build a function that is able to read the variables saved in these files, either by providing the full path name and the specific name of variable I am interested to read or without providing the name of the specific variable (in this case read all of them).

I am using as initial pattern a code that some other guy was using to read attributed or something, please see following:

  function readManyFilesHDF5(homePath::String, groupName::String, N::Int64)
    original_path_name  = pwd() # get the actual path name
    @show original_path_name

    roots = homePath # get the full path to the folder which contains the files under consideration
    files = readdir(roots) # read the file names of the files in the folder under consideration

    use_file = copy(files) # create a copy of the list of the file names, and use it later to remove the file names which have been read already
    for file_name in files # get each file name, in the folder under consideration, in sequence
	       fname = joinpath(roots,file_name) # creates the full path to the file in consideration
	       h5open(fname,"r") do fid # this is the 'fid' used in 'readattr' function as the first argument
		        global use_file # the original copy of the list of file names in the folder under consideration
                variableNames = names(fid) # get the names of variables in the file under consideration
                data = [fid[joinpath(key, "rest/of/path")] for key in variableNames] # get the data stored in the variables written in file under consideration


     cd(original_path_name) # restore the original path name
     @show pwd()


but I am not able to read the names of the variables/objects saved/written in those files, please refer to part of the code:

variableNames = names(fid) # get the names of variables in the file under consideration

Not answering your question, but this might be of interest:

names reads the variable names in a group. So you need to know which group you wrote the variables to. See the docs.

If you do

g = fid["mygroup"]

You get the names for that group.

One bigger picture question is: why use HDF5 directly rather than a wrapper like JLD?

@hendri54 Hi Lutz, I tried your solution as follows:

function readManyFilesHDF5(homePath::String)
    original_path_name  = pwd()
    @show original_path_name

    roots = homePath
    @show roots
    files = readdir(roots)
    @show files
    for file_name in files
        @show file_name
	    h5open(file_name,"r") do fid
            groupNames = names(fid)
            @show groupNames
            for keyGroup in groupNames
                @show keyGroup
                @show fid
                g = fid[keyGroup]
                @show g
                variableNames = names(g)
                # variableNames = names(keyGroup)
                @show variableNames
                for keyVariable in variableNames
                    @show typeof(h5read(file_name, keyGroup))
                    global data = h5read(file_name, keyGroup)

     @show pwd()
     return data

but at:

 variableNames = names(g)

it gives me this error:

ERROR: MethodError: no method matching names(::HDF5Dataset)

before the error the printing on REPL is as follows:

original_path_name = "/home/user/.julia/dev/PreImageProblems/experiments/Euclidean/_angles"
roots = "/home/user/.julia/dev/PreImageProblems/experiments/Euclidean/_angles"
files = ["mydata.h5", "mynewData.h5"]
file_name = "mydata.h5"
groupNames = ["myGroupPSNR", "myGroupSSIM", "myGroupXq"]
keyGroup = "myGroupPSNR"
fid = HDF5 data file: mydata.h5
g = HDF5 dataset: /myGroupPSNR (file: mydata.h5 xfer_mode: 0)

I’m afraid this is now above my paygrade.

It looks like the objects in groupNames are not groups by datasets.

Suppose you try

obj = fid["myGroupPSNR"]
data = read(obj)

That might help to identify what myGroupPSNR actually points to. At this point, it would be helpful to have the input of someone who has a lot better understanding of HDF5, though.

Also: are you tied to the HDF5 format or could you use a wrapper, such as JLD, which is better documented and hides the internals more?

@musm Hi Mustafa, sorry to disturb you but I saw you are quite active recently with contributions to HDF5.jl

My question is how to get the names of the variables saved/written in a group of a .h5 file?

As a bandaid solution, did you consider writing just one object by file, and this object is a NamedTuple or Dict that has the names of the variables as the keys? So the variables names are inside the object itself?

@Henrique_Becker Hi Henrique and thank you for your bandaid solution (I did consider it myself), I appreciate it very much, but in my case I already have a solution that singles out a group as follows:

function readManyFilesHDF5(homePath::String, fileName::String, groupName::String) # this is kind of final
    original_path_name  = pwd()

    roots = homePath
    checkExistFile = false::Bool
    checkExistGroup = false::Bool

    files = readdir(roots)

    data = Dict{String,Any}()
    if length(filter(x->occursin(fileName,x),files))>=1

	    h5open(fileName,"r") do fid
            groupNames = names(fid)

            if length(filter(x->occursin(groupName,x),groupNames))>=1

                push!(data, groupName => h5read(fileName, groupName))
                checkExistGroup = true::Bool

        checkExistFile = true::Bool


     return checkExistFile, checkExistGroup, data

and within this frame I can write only one variable for each group and use the same name either for the group and for the corresponding variable. Once I single out the group automatically I can single out even the variable. But this as you have said is just another bandaid solution :slight_smile:.

This post pops up when searching for the issue I had, so I thought I’ll make a quick note here for others. I tried running some older code and had this error

MethodError: no method matching names(::HDF5.Group)

It turns out that at some point names was renamed to keys.


@miromarszal Thank you!! Brand new to Julia, just trying to run a code project that someone put together from a while ago and was getting the no matching method for names. Switched to keys and working!