Is there a way to load only a part of a dataset in a JLD2 file?

There isn’t any ergonomic way to do this.

My recommendation would be to change the way you store data and try to split the nested vectors at least partially into nested JLD2 groups (for nesting) and separate datasets.

If you really want to read the current file you have (and assuming it’s just arrays of numbers ), I’d recommend using HDF5.jl directly to open the file and manually de-reference the nested arrays.

julia> using JLD2, HDF5

julia> data = 
        [ [ [1,2], [3,4] ], [ [5,6], [7,8] ] ]
2-element Vector{Vector{Vector{Int64}}}:
 [[1, 2], [3, 4]]
 [[5, 6], [7, 8]]

julia> jldsave("test.jld2"; data)

julia> f = h5open("test.jld2")
🗂️ HDF5.File: (read-only) test.jld2
├─ 📂 _types
│  └─ 📄 00000001
│     ├─ 🏷️ julia_type
└─ 🔢 data
   ├─ 🏷️ julia_type

julia> d = f["data"]
🔢 HDF5.Dataset: /data (file: test.jld2 xfer_mode: 0)
├─ 🏷️ julia_type

julia> d = read(f, "data")
2-element Vector{HDF5.Reference}:
 HDF5.Reference(HDF5.API.hobj_ref_t(0x0000000000001330))
 HDF5.Reference(HDF5.API.hobj_ref_t(0x00000000000014a8))

julia> ref = d[2]
HDF5.Reference(HDF5.API.hobj_ref_t(0x00000000000014a8))

julia> vec1 = read(f[ref])
2-element Vector{HDF5.Reference}:
 HDF5.Reference(HDF5.API.hobj_ref_t(0x0000000000001540))
 HDF5.Reference(HDF5.API.hobj_ref_t(0x00000000000015b0))

julia> ref2 = vec1[1]
HDF5.Reference(HDF5.API.hobj_ref_t(0x00000000000015b0))

julia> read(f[ref2])
2-element Vector{Int64}:
 5
 6
2 Likes