How can I read the names of variables/object stored/saved/written in a file using HDF5 in Julia

Hi guys, I hope you all have been safe from this covid19 virus.

I have built a code that runs different experiments and automatically build folders for those experiments and inside them subfolders for for different parameters that I have used for the same experiment. Inside these subfolders I save .h5 file with the result of the experiments that could be one or more variables.

Now I would like to build a function that is able to read the variables saved in these files, either by providing the full path name and the specific name of variable I am interested to read or without providing the name of the specific variable (in this case read all of them).

I am using as initial pattern a code that some other guy was using to read attributed or something, please see following:

  function readManyFilesHDF5(homePath::String, groupName::String, N::Int64)
    original_path_name  = pwd() # get the actual path name
    @show original_path_name

    roots = homePath # get the full path to the folder which contains the files under consideration
    files = readdir(roots) # read the file names of the files in the folder under consideration

    use_file = copy(files) # create a copy of the list of the file names, and use it later to remove the file names which have been read already
    for file_name in files # get each file name, in the folder under consideration, in sequence
	       fname = joinpath(roots,file_name) # creates the full path to the file in consideration
	       h5open(fname,"r") do fid # this is the 'fid' used in 'readattr' function as the first argument
		        global use_file # the original copy of the list of file names in the folder under consideration
                variableNames = names(fid) # get the names of variables in the file under consideration
                data = [fid[joinpath(key, "rest/of/path")] for key in variableNames] # get the data stored in the variables written in file under consideration

	       end
     end

     cd(original_path_name) # restore the original path name
     @show pwd()

     println(length(use_file))
end

but I am not able to read the names of the variables/objects saved/written in those files, please refer to part of the code:

variableNames = names(fid) # get the names of variables in the file under consideration

I thank you very much in advance for your time and support. Thank you very much also for your understanding.

Cheers

Ergnoor

Not answering your question, but this might be of interest:

names reads the variable names in a group. So you need to know which group you wrote the variables to. See the docs.

If you do

g = fid["mygroup"]
names(g)

You get the names for that group.

One bigger picture question is: why use HDF5 directly rather than a wrapper like JLD?

thank you very much Lutz for the answer and Greg for the suggestion, I appreciate it very much.

Cheers.

Ergnor

@hendri54 Hi Lutz, I tried your solution as follows:

function readManyFilesHDF5(homePath::String)
    original_path_name  = pwd()
    @show original_path_name

    roots = homePath
    @show roots
    files = readdir(roots)
    @show files
    for file_name in files
        @show file_name
	    h5open(file_name,"r") do fid
            groupNames = names(fid)
            @show groupNames
            for keyGroup in groupNames
                @show keyGroup
                @show fid
                g = fid[keyGroup]
                @show g
                variableNames = names(g)
                # variableNames = names(keyGroup)
                @show variableNames
                for keyVariable in variableNames
                    @show typeof(h5read(file_name, keyGroup))
                    global data = h5read(file_name, keyGroup)
                end
            end
	    end
     end

     cd(original_path_name)
     @show pwd()
     return data
end

but at:

 variableNames = names(g)

it gives me this error:

ERROR: MethodError: no method matching names(::HDF5Dataset)

before the error the printing on REPL is as follows:

original_path_name = "/home/user/.julia/dev/PreImageProblems/experiments/Euclidean/_angles"
roots = "/home/user/.julia/dev/PreImageProblems/experiments/Euclidean/_angles"
files = ["mydata.h5", "mynewData.h5"]
file_name = "mydata.h5"
groupNames = ["myGroupPSNR", "myGroupSSIM", "myGroupXq"]
keyGroup = "myGroupPSNR"
fid = HDF5 data file: mydata.h5
g = HDF5 dataset: /myGroupPSNR (file: mydata.h5 xfer_mode: 0)

Thank you again for your time and support.

Cheers

Ergnoor

I’m afraid this is now above my paygrade.

It looks like the objects in groupNames are not groups by datasets.

Suppose you try

obj = fid["myGroupPSNR"]
data = read(obj)

That might help to identify what myGroupPSNR actually points to. At this point, it would be helpful to have the input of someone who has a lot better understanding of HDF5, though.

Also: are you tied to the HDF5 format or could you use a wrapper, such as JLD, which is better documented and hides the internals more?

@musm Hi Mustafa, sorry to disturb you but I saw you are quite active recently with contributions to HDF5.jl

My question is how to get the names of the variables saved/written in a group of a .h5 file?

Thank you very much for your time and support.

Cheers

Ergnoor

As a bandaid solution, did you consider writing just one object by file, and this object is a NamedTuple or Dict that has the names of the variables as the keys? So the variables names are inside the object itself?

@Henrique_Becker Hi Henrique and thank you for your bandaid solution (I did consider it myself), I appreciate it very much, but in my case I already have a solution that singles out a group as follows:

function readManyFilesHDF5(homePath::String, fileName::String, groupName::String) # this is kind of final
    original_path_name  = pwd()

    roots = homePath
    checkExistFile = false::Bool
    checkExistGroup = false::Bool

    files = readdir(roots)

    data = Dict{String,Any}()
    if length(filter(x->occursin(fileName,x),files))>=1

	    h5open(fileName,"r") do fid
            groupNames = names(fid)

            if length(filter(x->occursin(groupName,x),groupNames))>=1

                push!(data, groupName => h5read(fileName, groupName))
                checkExistGroup = true::Bool

            end
	    end
        checkExistFile = true::Bool
     end

     cd(original_path_name)

     return checkExistFile, checkExistGroup, data
end

and within this frame I can write only one variable for each group and use the same name either for the group and for the corresponding variable. Once I single out the group automatically I can single out even the variable. But this as you have said is just another bandaid solution :slight_smile:.

Thank you very much for your time and support.

Cheers.

Ergnoor

This post pops up when searching for the issue I had, so I thought I’ll make a quick note here for others. I tried running some older code and had this error

MethodError: no method matching names(::HDF5.Group)

It turns out that at some point names was renamed to keys.