Loading/writing a single element from an array in JLD

jld

#1

I want to save and load data with JLD.jl (or HDF5.jl or something) which is is a 1-dimensional array of large elements (themselves large arrays). I want to be able to work with one or a few of the array elements at a time. Something like this for example:

file = jldopen("mydata.jld", "w")
y["a"][1] = rand(5)  # does not work
close(file)

A simple solution would be to use the dictionary-like access also for the index elements:

save("tmp/test.jld", "variable_$i", v)
load("tmp/test.jld", "variable_$i")

Is there a better solution? According to the HDF5.jl documentation, it is possible to save a subset of an array. Can this also be accomplished with JLD?

EDIT: the problems I had were because I tried to create and overwrite a dataset all at once. If I use the array syntax, it is possible to modify the contents of the dataset. This was unclear for me from the documentation, since there was no example showing this.

So this would not work:

using HDF5, JLD

jldopen("mydata.jld", "w") do file
    g = g_create(file, "mygroup")
    g["x"] = "foo"
end

jldopen("mydata.jld", "r+") do file
    g = file["mygroup"]
    g["x"] = "bar"
end

Nor does g["x”][1] = "bar"… While this works:

using HDF5, JLD

jldopen("mydata.jld", "w") do file
    g = g_create(file, "mygroup")
    g["x"] = ["foo", "bar"]
end

jldopen("mydata.jld", "r+") do file
    g = file["mygroup"]
    g["x"][:] = ["baz", "boo"]
end

This is probably all in the documentation. I’m just leaving this here to reduce confusion for others.


#2

According to the documentation you can mmap the data.

file = jldopen("mydata.jld", "r", mmaparrays=true)
y = read(file, "y")   # y will be a mmapped array, not read immediately in its entirety

Then you can read individual elements or slices (e.g. y[3][5:11]).


#3
julia> a = jldopen(tempname(),"w")
Julia data file version 0.1.1: /tmp/juliaFzEqKh

julia> a["a"] = [rand(5),rand(4),rand(3)]
3-element Array{Array{Float64,1},1}:
 [0.138439, 0.79078, 0.977075, 0.271019, 0.236018]
 [0.514218, 0.823196, 0.647368, 0.869087]         
 [0.93793, 0.410167, 0.129439]                    

julia> a["a"][2]
1-element Array{Array{Float64,1},1}:
 [0.514218, 0.823196, 0.647368, 0.869087]

#4

This is very useful! But I was unclear in the title - I also need to write elements on the array. I can mmap to load the data but when writing I have to close and open the file again in write mode:

file = jldopen("mydata.jld", "w", mmaparrays=true)
file["x"][1] = ones(5)  # does not work since we are in write mode!

# also this does not work since file is closed at some point
file = jldopen("mydata.jld", "r", mmaparrays=true)
x = file["x"]
close(file)
file = jldopen("mydata.jld", "w", mmaparrays=true)
file["x"] = x  # ERROR: Error opening object //x
close(file)