Future-proof way to save and load data

I’m frustrated with JLD2, which I thought was the way to save/load data, but it’s reconstructed types aren’t playing well with my types.

But I just ran across a post from a year ago suggesting JLD2 is effectively abandoned?

What’s the recommended way to save and load data? And is this recommendation documented somewhere for new people?

I have not found a reliable way of saving / loading user defined types. What works for me currently is JLD2 with a couple of tweaks.

I keep file sizes as small as possible, often saving parts of objects to multiple files.

For simple but large objects that can be saved as Dicts of simple objects, I use JSON3.

Hopefully someone else has a better answer.

3 Likes

I’ve found combining serialization and compression with CodecZstd to be reasonably efficacious for saving large arbitrary data structures:

using Revise, CodecZstd, Serialization

function OUT_BIN_STREAM(p::String, obj; level=1)

  io = ZstdCompressorStream(open(p, "w"), level=level)
  try
    serialize(io, obj)
  finally
    close(io)
  end
end


function IN_BIN_STREAM(p::String)
  local obj
  open(ZstdDecompressorStream, p) do io
      obj = deserialize(io)
  end

  return obj
end


obj = [rand(2,3), rand(1:4, 3)]
OUT_BIN_STREAM("test.jls",obj)
objin = IN_BIN_STREAM("test.jls")
println(objin)

out:

Array[[0.5668242695342034 0.3817159295340431 0.675236515669956; 0.2899742178438791 0.05402216513805813 0.9520545695006444], [3, 4, 2]]

However, I think its allowed for future versions of Julia to break the format, so its only future proof if in the future you have access to older versions of Julia (which are fortunately available).

Edit: typo

2 Likes

You can try JDF.jl but I am planning a big update for v0.3.

I am trying write a parquet writer so that may be another avenue. I found Feather.jl to be quite good for most cases but there is feather v2 format coming out and I am not sure if Feather.jl supports it. But the old feather format can be read by all feather readers as Wes McKinney has promised.

1 Like

That won’t work well in JDF, unless your type is serializable.

Have you tried BSON.jl? That has worked well for me (though I haven’t tried very complicated user types with it).

2 Likes

If you are concerned about reading it back in a decade or so, write a simple conversion routine that saves it into a format composed purely of primitives (numbers, keys, strings, etc), and dump it into something simple, eg JSON.

The fundamental problem faced by all solutions that try to translate from and to native Julia types is that Julia’s types can be incredibly complex.

4 Likes

In R you have the nice dput command that saves you objects as R code, which you later just have to evaluate. Really useful for stackoverflow questions :slight_smile: . Is there something similar in Julia?

See Equivalent to R's dput in Julia - Stack Overflow

I literally just put your question into google and found the above :slight_smile:

pretty sure dput an xgboost binary model object would not derive the expected results though

So the answer is repr:

julia> A = rand(2, 2)
2×2 Array{Float64,2}:
 0.190069  0.362938
 0.605066  0.284478

julia> repr(A)
"[0.19006937822743164 0.3629382066681701; 0.6050661475359072 0.284478027520074]"
1 Like

Note that the data types in R are much, much simpler than Julia. Basically vectors (of booleans, integers, floats, complex numbers, characters), with some added metadata as key-value pairs. No user-defined types, no type parameters; not even scalars, just 1-element vectors.

Yes, the documentation of repr says that binary data is shown as UInt8 Vectors.

I checked out BSON, but it seems my data is too large for it to handle.