Saving and Loading Custom Types

question

#1

I’ve been having some difficulties storing custom data types, and would love to hear people’s suggestions and experiences.

In the past I’ve always used JLD to convenient save / load custom types from disk. But recently I consistently had problems. About 95% of the time, my files would load ok, but 5% of the time (not randomly) they simply wouldn’t load even with the same version that I saved them with. Sometimes they would load on one machine but not on another. It is possible that this was a problem of scale - as I was suddenly storing O(1-10 GB) data whereas before it was typically << 10 MB. Even writing custom save/load wrappers did not resolve this all the time. The second problem I had with JLD (which JLD2 improved but at the cost of additional bugs(?)) was performance.

As a result, I moved to “pure” HDF5 (HDF5.jl) for large arrays and JSON for custom datatypes (JSON.jl), writing conversion routines from custom datatypes to Dict and reverse. This has so far worked very well for me, I have not run into any difficulties or performance problems. However, it comes at the cost of managing this conversion myself (which is of course another source of bugs). Maybe this is just the prize I have to pay.

I considered trying out other packages, e.g., BSON.jl which might do this automatically again, but I a bit wary now due to my previous experiences.

So I am curious

  • What mechanism / packages others use to store large arrays of custom, possibly nested, datatypes.
  • Is my experience with JLD/JLD2 unique or have other had similar problems and I just need to revisit how I am using it?
  • Maybe a semi-automatic, “user-guided” conversion between custom types and Dict of some small set of elementary datatypes is in fact a useful general-purpose tool? If so, does such a package exist?

P.S.: And before anybody asks that I should just file bug reports with JLD/JLD2: I wasted many many hours to try and reproduce my problems on MWE without success. Reproducing my bugs requires a fairly eco-system of Julia packages and/or several GB of data.


#2

I think so.

Serializiation for “ephemeral” storage (= anything I won’t need a week from now, and/or can regenerate easily), HDF5 for long-term, with a lot of metadata comments (tedious, yes, but the only thing that works).

No, I have problems all the time, similar to you. Sometimes I find the open issues for them, but for now I just gave up on it. This doesn’t preclude JLD2 from becoming a viable solution in the long run though, if it stabilizes.