Hi, I am working on a script to create a YAXArrays dataset from an HDF5 file. The idea is to iterate over the datasets in a selected group and combine them into a single dataset.
The first issue I encountered is the inability to lazily load the data. I couldn’t find any documentation on this, so I am currently loading all the data into memory.
The second issue is that I am unable to convert the dictionary of YAXArrays I created into a Dataset.
Please see the code below. I would greatly appreciate any help with these issues.
using HDF5
using YAXArrays
# Function to infer dimensions and create YAXArray
function create_yaxarray(group_path, var_name, dataset)
# Infer dimension names from attributes
all_attrs = Dict(attrs(dataset)) # Assumes "DimensionNames" attribute exists
if haskey(all_attrs, "DimensionNames")
dim_names = all_attrs["DimensionNames"]
end
if dim_names === missing
println("Warning: No dimensions attribute for $var_name, using default.")
dim_names = ["dim_$i" for i in 1:ndims(dataset)] # Default dimension names
else
dim_names = split(dim_names, ",") # Split comma-separated dimensions
end
# Create YAXArray
axlist = Tuple(
Dim{Symbol(dim_name)}(collect(1:size(dataset)[dim_i]))
for (dim_i,dim_name) in enumerate(dim_names))
data = read(dataset)
all_attrs["name"] = var_name # Name of the variable
all_attrs["source"] = group_path # Source group path
return YAXArray(
axlist, # Dimension names as a NamedTuple
data, # The dataset (either lazy or fully loaded)
Dict(all_attrs) # Source group path
)
end
function load_dataset(filename::String, group::String = "/ScienceData/Geo")
# Open the HDF5 file
fid = h5open(filename, "r")
datasets = Dict{Symbol, YAXArray}()
try
# Iterate over Geo group
for var_name in keys(fid[group])
var_path = joinpath(group, var_name)
dataset = fid[var_path]
datasets[Symbol(var_name)] = create_yaxarray(group, var_name, dataset)
end
finally
close(fid) # Ensure the file is closed
end
return datasets
end
when I execute the code it creates a dictionary of YAXArrays as expected. However when I try to convert this dictionary into a dataset
data_dict = load_dataset(filename, "/ScienceData/Geo")
ds = Dataset(;properties = Dict{String,Any}(), data_dict)
I get the following error message:
ERROR: MethodError: no method matching iterate(::Nothing)
Closest candidates are:
iterate(::LibGit2.GitConfigIter)
@ LibGit2 /cm/shared/apps/julia/1.10.6/share/julia/stdlib/v1.10/LibGit2/src/config.jl:225
iterate(::LibGit2.GitConfigIter, ::Any)
@ LibGit2 /cm/shared/apps/julia/1.10.6/share/julia/stdlib/v1.10/LibGit2/src/config.jl:225
iterate(::LaTeXStrings.LaTeXString, ::Int64)
@ LaTeXStrings ~/.julia/packages/LaTeXStrings/6NrIG/src/LaTeXStrings.jl:108
…Stacktrace:
[1] foreach(f::YAXArrays.Datasets.var"#3#6", itr::Nothing)
@ Base ./abstractarray.jl:3098
[2] (::YAXArrays.Datasets.var"#2#5")(c::Dict{Symbol, YAXArray})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/ppMtD/src/DatasetAPI/Datasets.jl:37
[3] foreach(f::YAXArrays.Datasets.var"#2#5", itr::@NamedTuple{data_dict::Dict{Symbol, YAXArray}})
@ Base ./abstractarray.jl:3098
[4] Dataset(; properties::Dict{String, Any}, cubes::@Kwargs{data_dict::Dict{Symbol, YAXArray}})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/ppMtD/src/DatasetAPI/Datasets.jl:35
[5] top-level scope
@ REPL[33]:1
I would appreciate any help on these two issues.