ANN: JLD2 (JLD in pure Julia)

I’m pleased to announce that JLD2 is now ready for testing. Like JLD, it reads and writes Julia objects to disk in an HDF5-compatible format, along with metadata to reconstruct the types if the type definitions are no longer available. Unlike JLD, it does so in pure Julia, without a dependency on the HDF5 C library. For large arrays of immutables, this makes little difference to performance, but for complex data structures, JLD2 typically achieves far greater performance. In fact, it sometimes outperforms Julia’s built-in serializer. Additionally, most issues reported against JLD should be fixed in JLD2.

For now, you should be cautious when using JLD2. Although it has decent test coverage, it hasn’t received the same amount of real-world testing as JLD, so there is some possibility that it won’t be able to correctly read back in the data that it writes. So for now, it’s best to stick to data that you can reconstruct if necessary.

43 Likes

Great! Is this also meant has an HDF5.jl replacement in the long run?

2 Likes

A lot more work would need to be done to support the full HDF5 specification. It’s possible in principle, but not one of my goals for JLD2.

I love this.

Why not just replace JLD?

1 Like

Is the plan to replace JLD with JLD2 at some point?

Basically, it needs someone who has the time and devotion to maintain this long-term.

JLD cannot save objects with fields that are functions. Can JLD2 handle that?

It can usually save functions, but closures are saved as uncallable objects (which can be fine depending on your use case).

Okay, I will give it a shot. Thanks!

So until that happens, JLD is the recommended way to save Julia objects?
And what about base serialize - do I understand correctly that this is meant for more low-level things?

So should production code use Base.serialize, JLD.jl or JLD2.jl?

JLD.jl by default, Base.serialize only when you are really willing to sacrifice compatibility across versions for a small speed gain (which may or may not be significant, you have to benchmark).

I would use JLD2 over JLD any day.

2 Likes

Thanks for the answers!
I think we will go with JLD2 because in our case it is possible to restore the state from raw data as well if something really should go wrong.
But I hope JLD2 (as a pure Julia solution) will replace JLD once things stabilize with Julia 0.7.

There is more I believe. As far as I know JLD(2) has trouble storing data structures which in their turn store functions. That is not a problem for serialization. So that would be a significant reason for choosing serialization over JLD(2).

Refer to my comment ANN: JLD2 (JLD in pure Julia) - #15 by PetrKryslUCSD if you want to be able to store functions with the data, go with serialization.