Hi,
I am trying to read data from a HDF5 file which is larger than the memory of my computer. I think every time I just need part of data in the file. Is there a way to get part of groups/dataset in the file without loading the whole file? Thanks
Hi,
I am trying to read data from a HDF5 file which is larger than the memory of my computer. I think every time I just need part of data in the file. Is there a way to get part of groups/dataset in the file without loading the whole file? Thanks
Yes. That’s how it HDF5.jl usually works.
Could you share some example code demonstrating your problem?
Here’s a demonstration of creating a 8 GB file and then retrieving a single element.
julia> using HDF5
julia> h5open("bigfile.h5", "w") do h5f
h5f["large_dataset"] = rand(1024, 1024, 1024)
end;
julia> g() = h5open("bigfile.h5") do h5f
h5f["large_dataset"][1024,512,256]
end
g (generic function with 1 method)
julia> @time g()
0.000527 seconds (51 allocations: 2.031 KiB)
0.5066790863746067
julia> @time g()
0.001117 seconds (51 allocations: 2.031 KiB)
0.5066790863746067
You could also potentially memory map the file, look for mmap
.
Hi Mark:
Yes this works. Before I thought h5open
will load the whole file which is too large (the file I am using is tens of GB) but I tried what you suggested and everything is good!
Hi,
I have a further question about this. Suppose in my file I have many groups (like 10000) and I want just get 1000 random group each time. Is there a way to do this?Thanks.
For an array I know I can do sample(data,1000,replace = true)
(the reason I want replace to be true is I am trying to do something like bootstrapping). But I don’t know how to do this for groups in a hdf5 file.
You could do something like this, but this is not lazy anymore.
julia> h5open("test.h5", "w") do h5f
h5f["r/a"] = 1
h5f["r/b"] = 2
h5f["r/c"] = 3
h5f["r/d"] = 4
end
4
julia> h5open("test.h5", "r") do h5f
_samples = sample(keys(h5f["r"]), 1000, replace = true)
map(_samples) do _sample
h5f["r"][_sample][]
end
end
1000-element Vector{Int64}:
4
2
1
1
2
3
4
1
2
2
⋮
4
3
4
1
2
3
1
3
3