JLD.jl vs JLD2.jl

galenlynch · September 21, 2018, 3:15pm

What is the suggested way of saving a bunch of variables to a file at this point, JLD or JLD2? The last thread comparing the two packages was JLD2’s announcement a year ago, and people seemed split over which to use. It doesn’t seem that there have been any commits to JLD.jl in a year.

JonasIsensee · September 21, 2018, 3:17pm

JLD.jl is not yet compatible with Julia v0.7/ v1.0 .
If you’re on v1.0 then JLD2.jl is the way to go AFAICT.

galenlynch · September 21, 2018, 3:49pm

Ok thanks!

baggepinnen · September 21, 2018, 5:50pm

You can also check out BSON.jl

galenlynch · September 21, 2018, 5:55pm

Cool, thanks! What are the pros and cons of BSON? Seems like two pros are that it’d be more language independent, and that it doesn’t use HDF5 (and therefore avoids the corruption issues etc that comes with HDF5)?

baggepinnen · September 21, 2018, 6:01pm

It can also save closures and functions which, of I’m not mistaken, jld(2) can not.

bkamins · September 21, 2018, 6:29pm

The only caveat is that JLD2.jl needs some work to consistently work on Julia 1.0 in all cases.

ExpandingMan · September 21, 2018, 6:37pm

Can you describe what you expect not to work? I’ve been using JLD2.jl on 1.0 for a while now and haven’t encountered any problems.

More generally, it’s seemingly like JLD.jl is deprecated in favor of JLD2.jl (at least in practice if not in name). Are there any plans to make this official and change the name of JLD2.jl to JLD.jl?

bkamins · September 21, 2018, 8:28pm

Here are the problems I think are most significant:

handling missing: EXCEPTION_ACCESS_VIOLATION with large Vector{Union{Missing,Int32}} · Issue #108 · JuliaIO/JLD2.jl · GitHub, Zeros converted to missing when loading DataFrame with Union types · Issue #111 · JuliaIO/JLD2.jl · GitHub
Correctly handling situation if underlying layout of a type changes: Char data type is not compatible between Julia 0.7 and Julia 0.6 · Issue #110 · JuliaIO/JLD2.jl · GitHub
Saving UnionAll: Failure to save struct parameterized on Union containing UnionAlls. · Issue #109 · JuliaIO/JLD2.jl · GitHub
Handling types across modules: Reconstructing types defined in one module inside another · Issue #107 · JuliaIO/JLD2.jl · GitHub

And they show that also some other problems might lurk in corner cases.

iwelch · September 30, 2018, 5:43pm

there is an irony here. I believe one lauded aspect of the JLD format over the serialization format was that it would be stable for much longer than serialization, whose formats could change with every release, and the newer versions would forget how to load the older ones.

it reminds me a little of the BBC Domesday Project - Wikipedia , which was supposed to last another 1000 years and did not even make it stably to age 30.

for a format to be long-term stable means also that the julia package will be stable. will JLD2 be long-term stable?

ChrisRackauckas · September 30, 2018, 8:48pm

Simon was weary of registering it because at JuliaCon 2017 he mentioned he probably wouldn’t have the time to maintain it properly. We are now seeing the effect of that, we were properly warned . I think the bigger question is what should we do about it. Since Julia is finally stable with v1.0, it sounds like if the issues @bkamins mentions are addressed then it will be a quite good Julia v1.0 offering. In that case, it may end up stable by default after that, which could be a good thing for this kind of library.

For JLD proper, there is Fix all julia 0.7 issues by crbinz · Pull Request #227 · JuliaIO/JLD.jl · GitHub . Personally, I think we as a community need to go to JLD2 or BSON because of the function support that they offer (this is required for any DiffEq usage of these tools), and having a common saving format is somewhat essential to making things jive well. (But it’s always easy to mention work someone else should do haha.)

iwelch · September 30, 2018, 10:15pm

thx, chris. I did not mean to complain. I agree that it would be nice to have a permanent binary storage format. Unless the code to serialize/deserialize were backward-compatible, so that later Julia versions could still read earlier Julia data.

ChrisRackauckas · September 30, 2018, 10:23pm

Oh no worries. I put the parenthetical because I am saying a lot about what JLD, JLD2, BSON “should do” and putting no work into it myself . Serialization is an interesting mention though since I wonder how much that could change post Julia v1.0. Serialization is heavily tied to things like message passing for multiprocessing so I don’t think it could change without being breaking, so “serialization won’t break in Julia v1.x” might be a safe bet. I’d bounce that off someone who works on the internals to double check though.

kristoffer.carlsson · September 30, 2018, 10:44pm

JLD2 works fine on 1.0. What are the effects you are mentioning?

ChrisRackauckas · September 30, 2018, 10:50pm

I agree it works fine on v1.0. The effects that I am mentioning are the ones from @bkamins’s list.

JLD2 works fine and has a stability upside due to its limited activity, but that does mean that support for the latest and greatest features will lag. Some of these features, like missing, can be pretty crucial in some communities so it’s important to note that there’s really no one who finds it their duty to add features to it daily/weekly.

iwelch · October 1, 2018, 5:21am

apologies for piping in. can I summarize my understanding of Serialization vs JLD2?

Serialization could be less trustworthy as a long-term storage format, i.e., for data that one still wants to read in 10 years. Although serialize could change, as long as deserialize can still read old-version-serialized data, the previous sentence could be wrong. Serialization could serve as a viable long-term data storage format.

JLD2 is an alternative binary format, albeit not one that is part of the base language. It’s main advantage is HDF5 writing (but not reading) compatibility. It may sometimes be faster than Serialization. However, it is not maintained by base, and has some edge-case problems that may be fixed in the future. Thus, if the maintainers lose interest, it may not be as good as a long-term storage format, either.

Putting the two together, I am wondering whether either or both are good long-term data storage formats.

bkamins · October 1, 2018, 9:25am

You should be aware that whatever binary storage format is used it has to take care of possibilities of:

different underlying infrastructure
different versions of Julia
different specification of user defined types between sessions

For serialize/deserialize to work all three things above must not change.

For JLD2.jl AFAIK different underlying infrastructure is currently handled correctly. The condition different versions of Julia can be handled (but still requires some work - and big thanks for people who take care of this - all people who could help here would be very welcome). Ensuring different specification of user defined types between sessions is even harder and I do not know what are concrete plans to support it (I am not involved in any of the packages though so I might not know something).

In general I think it would be best if, as a community, we would decide if BSON or JLD2 is a primary long-term binary storage format and all concentrate on supporting it. Of course both are valuable, but given that writing and maintaining such infrastructure package is difficult and not very rewarding it would be great if at least one of them has a decent community behind it.

Tamas_Papp · October 1, 2018, 11:38am

FWIW, I would use

HDF5 or a similar stable format for anything long-term, just making use of basic types, eg arrays of homogeneous items (which of course means that you cannot easily use complicated composite types and constructs),
for anything else I can easily regenerate, mmap or gzipped serialize, depending on various trade-offs, with the understanding that I would have to regenerate this occasionally, and set up the infrastructure for it.

iwelch · October 1, 2018, 4:34pm

Is the serialized data format compatible across different computers? I tried macos and linux on x86, and they were compatible. Are there any known infrastructures where they are not compatible?
are there any known instances where user-defined types can wreak havoc on serialized but not on JLD2 data?
I am thinking that the most endearing aspect of JLD2 is that it is HDF5 compatible, thus interchangeable, and thus more likely to last for longer. Fair?
If I had data that I would want to be readable in 50 years, I am thinking that even HDF5 is not half as safe a bet as almost any text-format. So, for long-term storage, even yuck-CSV and yuck-JSON may be better bets.

Tamas_Papp · October 1, 2018, 4:43pm

I don’t know.
I think it is the other way round: for a given version and OS architecture, JLD2 may be less resilient than serialization.
AFAIK JLD2 is a subset of HDF5 with special metadata: that is to say, a HDF5 reader may be able to extract all the information in some format, but it would need to be reconstructed.
Don’t know about that. Including its precursor, the HDF group has been around since the late 1980s. My problem with CSV is lack metadata: I can read it, but what does it mean?

Topic		Replies	Views
A future for JLD2? Community jld2	56	9804	July 19, 2020
Recommended serialization interface in Oct 2020: JLD, JLD2, New to Julia question , jld , jld2	6	2090	October 26, 2020
What is the preferred way to save variables? General Usage jld , hdf5 , jld2	39	18540	August 24, 2021
[ANN] JLD2 v0.2.0 Package Announcements package , announcement	9	2341	September 7, 2020
Need data-storage package for 0.7 (JLD no longer working) General Usage	18	2378	August 8, 2018

JLD.jl vs JLD2.jl

Related topics