Can't read old JLD2 file

i did an ] update and now i can’t read a .jld2 file made yesterday. a diff on the current and previous Manifest.toml files shows that only the DiffEqDiffTools, LsqFit, NLSolversBase, NaturalSort, Rotations, and WeakRefStrings packages have changed. activating a new empty test environment and installing JLD2 shows that it depends on none of these packages. can anyone explain how JLD2 is so fragile?

What error message are you getting?

**ERROR:** BoundsError: attempt to access 8249-element Array{Int64,1} at index [8250]

Stacktrace:

 [1] **getindex** at **./array.jl:731** [inlined]

 [2] **getindex** at **./multidimensional.jl:412** [inlined]

 [3] **read_heap_object(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.GlobalHeapID, ::JLD2.ReadRepresentation{UInt8,UInt8} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/global_heaps.jl:130**

 [4] **jlconvert** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/data.jl:700** [inlined]

 [5] **jlconvert** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/data.jl:766** [inlined]

 [6] **macro expansion** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/dataio.jl:70** [inlined]

 [7] **macro expansion** at **./simdloop.jl:73** [inlined]

 [8] **read_array!** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/dataio.jl:68** [inlined]

 [9] **read_array(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadDataspace, ::JLD2.ReadRepresentation{String,JLD2.Vlen{String}}, ::Int64, ::UInt16, ::JLD2.RelOffset, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:323**

 [10] **read_data(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadRepresentation{String,JLD2.Vlen{String}}, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:173**

 [11] **macro expansion** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datatypes.jl:76** [inlined]

 [12] **read_data(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadDataspace, ::UInt8, ::Int64, ::Int64, ::Int64, ::UInt16, ::JLD2.RelOffset, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:152**

 [13] **load_dataset(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.RelOffset **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:92**

 [14] **jlconvert** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/data.jl:674** [inlined]

 [15] **macro expansion** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/dataio.jl:70** [inlined]

 [16] **macro expansion** at **./simdloop.jl:73** [inlined]

 [17] **read_array!** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/dataio.jl:68** [inlined]

 [18] **read_array(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadDataspace, ::JLD2.ReadRepresentation{AbstractArray{T,1} where T,JLD2.RelOffset}, ::Int64, ::UInt16, ::JLD2.RelOffset, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:323**

 [19] **macro expansion** at **./logging.jl:309** [inlined]

 [20] **read_data(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadRepresentation{Any,JLD2.RelOffset}, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:194**

 [21] **macro expansion** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:154** [inlined] (repeats 2 times)

 [22] **macro expansion** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datatypes.jl:72** [inlined]

 [23] **read_data(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadDataspace, ::UInt8, ::Int64, ::Int64, ::Int64, ::UInt16, ::JLD2.RelOffset, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:152**

 [24] **load_dataset(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.RelOffset **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:92**

 [25] **jlconvert** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/data.jl:674** [inlined]

 [26] **macro expansion** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/data.jl:1285** [inlined]

 [27] **jlconvert(** ::JLD2.ReadRepresentation{getfield(JLD2.ReconstructedTypes, Symbol("##DataFrames.DataFrame#373")),JLD2.OnDiskRepresentation{(0, 8),Tuple{Any,getfield(JLD2.ReconstructedTypes, Symbol("##DataFrames.Index#372"))},Tuple{JLD2.RelOffset,JLD2.OnDiskRepresentation{(0, 8),Tuple{Dict{Symbol,Int64},Any},Tuple{JLD2.CustomSerialization{Array,JLD2.RelOffset},JLD2.RelOffset}}()}}()}, ::JLD2.JLDFile{JLD2.MmapIO}, ::Ptr{Nothing}, ::JLD2.RelOffset **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/data.jl:1232**

 [28] **read_data(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadRepresentation{getfield(JLD2.ReconstructedTypes, Symbol("##DataFrames.DataFrame#373")),JLD2.OnDiskRepresentation{(0, 8),Tuple{Any,getfield(JLD2.ReconstructedTypes, Symbol("##DataFrames.Index#372"))},Tuple{JLD2.RelOffset,JLD2.OnDiskRepresentation{(0, 8),Tuple{Dict{Symbol,Int64},Any},Tuple{JLD2.CustomSerialization{Array,JLD2.RelOffset},JLD2.RelOffset}}()}}()}, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/dataio.jl:37**

 [29] **read_data(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadDataspace, ::UInt8, ::Int64, ::Int64, ::Int64, ::UInt16, ::JLD2.RelOffset, ::Array{JLD2.ReadAttribute,1} **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:149**

 [30] **load_dataset(** ::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.RelOffset **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/datasets.jl:92**

 [31] **getindex(** ::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, ::String **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/groups.jl:108**

 [32] **read(** ::JLD2.JLDFile{JLD2.MmapIO}, ::String **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/JLD2.jl:326**

 [33] **#3** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/loadsave.jl:77** [inlined]

 [34] **#jldopen#31(** ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(Main, Symbol("##3#4")), ::String **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/loadsave.jl:4**

 [35] **jldopen(** ::Function, ::String **)** at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/loadsave.jl:2**

 [36] top-level scope at **/groups/scicompsoft/home/arthurb/.julia/packages/JLD2/KjBIK/src/loadsave.jl:76*
1 Like

it could be this. am investigating…

1 Like

FWIW, I am under the impression that JLD2 is effectively abandonned (major outstanding bugs with data corruption are have not been fixed for a long time).

So your best bet is probably reproducing the previous environment, reading back your data, and then saving it in a more timeproof format (eg plain HDF5).

1 Like

… which is very sad (and bad), IMHO.

First, JLD seemed to be the Julia file format. Then, during the Julia 1.0 transition, JLD was basically abandoned (it was only complaints from a bunch of users that lead to fixes for the most essential things) and JLD2 was praised as the future. And now, JLD2 seems abandoned.

1 Like

That’s not ideal to say the least, specially since HDF5 tries to build gcc form source on mac OS (it’s an issue with homebrew), taking hours to install. That said I’ve never had an issue with JLD2, I can even read v0.6 files with v1.0, but maybe I’ve been lucky.

I think that something like JLD2 would have been nice, but in general, Julia’s type system is too rich for something like this to work reliably without a lot of effort.

A native HDF5 implementation would be useful though, but that is also a lot of work.

I think that the existence of JLD2 in its current state misleads users into considering it as a reliable solution for long- or medium-term data externalization, with unpleasant surprises and costly debugging and data reproduction later on.

1 Like

Installation of HDF5 is kindof an orthogonal issue. HDF5 is well known to be the file format of choice for data and I have not seen compatibility issues with that. JLD can work fine but it depends on the types that are being used.

What should be clear is that plain HDF5 can require manual conversion (e.g. for complex numbers). What would be great if we get better support for compound datatypes which will improve the situation there. Also a stable way of storing Dicts (and restoring them) would be great.

I disagree with this — I think the best solution is to support datatypes which have a native or near-native correspondence with what is supported by HDF5, and require explicit conversion by the user for the rest.

This seems like an unnecessary overhead, but making the mapping automatic is not a strategy that works well even in the medium run, as such mappings inevitable break down, sometimes silently (with no error, just corrupted data).

IMO this has been the source of a lot of grief for users who thought that they can just use JLD2 or similar as a fast, native, and reasonably robust data storage format. I think that there are two viable approaches:

  1. emphemeral data one can regenerate at little cost: Serialization.serialize and friends or Mmap.mmap, depending on the details,

  2. structured data that is costly to regenerate: map to something that is supported by HDF5, and for long-term storage write a lot of relevant metadata, including strings with explanations, and hope for the best.

2 Likes

I have not said automatic. Currently the compound datatype support in HDF5.jl is pretty limited. One gets a binary blob when reading compound data.

FWIW, I always used/saw JLD as HDF5 + convenience conversions. I’d be fine if we restrict the convenience part to a basic subset of types, ComplexF64 being the most important I guess.

What should we use then?
JLD2, HDF5, Feather, Parquet, JLD, fst…?

Some people say JLD produce memory leakage.

It would be great to have something like TileDB. Or something fast, escalable and able to add rows and columns on disk.

perhaps between the seven of us we can work to fix and improve JLD2. volunteers are the nature of open source projects like julia afterall. just need to get familiar with the code, and have someone reliable with write access to merge the PRs.

1 Like

I detailed my suggestions above. TL;DR: HDF5 or serialization.

I am afraid that the problem is more difficult than this. Julia has a very rich type system which also maps to low-level building blocks very efficiently, and mapping complex nested data into any representation while keeping it fast is inherently very difficult. Almost all similar packages ran into this broad issue.

And while being able to save and load very complex data effortlessly and in a way that promises long-run compatibility is inherently appealing, I am not sure it is possible or even desirable. Again, for short-term storage we have Serialization, and for long-term storage, can we guarantee that something like

NamedTuple{(:a, :b), Tuple{Union{Missing,Integer}, AbstractString}}[(a = 1, b = "foo")]

will even make sense 5 years from now? If no, do we want to reconstruct the exact container types, or should they not matter?

I find it is worth mapping the data to arrays of primitives like Float64, and save that. For this, HDF5 is perfect.

I agree with the complex times. At the moment, we rely on FlatBuffers instead of hdf5, as the problem with memory leak was on the side of hdf5.

Edited - possibly misleading

I think this is a very promising step forward. HDF5.jl as the low level interface and then some HDF5Extension.jl where we implement some automatic or semiautomatic conversion rule. But in contrast to JLD, HDF5Extension.jl would only support conversions where we are totally sure that they will work independently of the Julia version. For instance, here is code to store complex arrays:

It uses the same convention as Octave, where re/im are put into a compound data type. So all is allocation free and layed out in memory as is the Julia array.

2 Likes