Future-proof way to save and load data

BioTurboNick · May 8, 2020, 11:49pm

I’m frustrated with JLD2, which I thought was the way to save/load data, but it’s reconstructed types aren’t playing well with my types.

But I just ran across a post from a year ago suggesting JLD2 is effectively abandoned?

What’s the recommended way to save and load data? And is this recommendation documented somewhere for new people?

hendri54 · May 8, 2020, 11:59pm

I have not found a reliable way of saving / loading user defined types. What works for me currently is JLD2 with a couple of tweaks.

I keep file sizes as small as possible, often saving parts of objects to multiple files.

For simple but large objects that can be saved as Dicts of simple objects, I use JSON3.

Hopefully someone else has a better answer.

clinton · May 9, 2020, 12:13am

I’ve found combining serialization and compression with CodecZstd to be reasonably efficacious for saving large arbitrary data structures:

using Revise, CodecZstd, Serialization

function OUT_BIN_STREAM(p::String, obj; level=1)

  io = ZstdCompressorStream(open(p, "w"), level=level)
  try
    serialize(io, obj)
  finally
    close(io)
  end
end


function IN_BIN_STREAM(p::String)
  local obj
  open(ZstdDecompressorStream, p) do io
      obj = deserialize(io)
  end

  return obj
end


obj = [rand(2,3), rand(1:4, 3)]
OUT_BIN_STREAM("test.jls",obj)
objin = IN_BIN_STREAM("test.jls")
println(objin)

out:

Array[[0.5668242695342034 0.3817159295340431 0.675236515669956; 0.2899742178438791 0.05402216513805813 0.9520545695006444], [3, 4, 2]]

However, I think its allowed for future versions of Julia to break the format, so its only future proof if in the future you have access to older versions of Julia (which are fortunately available).

Edit: typo

xiaodai · May 9, 2020, 1:17am

You can try JDF.jl but I am planning a big update for v0.3.

I am trying write a parquet writer so that may be another avenue. I found Feather.jl to be quite good for most cases but there is feather v2 format coming out and I am not sure if Feather.jl supports it. But the old feather format can be read by all feather readers as Wes McKinney has promised.

xiaodai · May 9, 2020, 1:18am

That won’t work well in JDF, unless your type is serializable.

benninkrs · May 9, 2020, 3:39am

Have you tried BSON.jl? That has worked well for me (though I haven’t tried very complicated user types with it).

Tamas_Papp · May 9, 2020, 7:58am

If you are concerned about reading it back in a decade or so, write a simple conversion routine that saves it into a format composed purely of primitives (numbers, keys, strings, etc), and dump it into something simple, eg JSON.

The fundamental problem faced by all solutions that try to translate from and to native Julia types is that Julia’s types can be incredibly complex.

gdkrmr · May 9, 2020, 10:24am

In R you have the nice dput command that saves you objects as R code, which you later just have to evaluate. Really useful for stackoverflow questions . Is there something similar in Julia?

xiaodai · May 9, 2020, 10:41am

See Equivalent to R's dput in Julia - Stack Overflow

I literally just put your question into google and found the above

pretty sure dput an xgboost binary model object would not derive the expected results though

gdkrmr · May 9, 2020, 10:48am

So the answer is repr:

julia> A = rand(2, 2)
2×2 Array{Float64,2}:
 0.190069  0.362938
 0.605066  0.284478

julia> repr(A)
"[0.19006937822743164 0.3629382066681701; 0.6050661475359072 0.284478027520074]"

Tamas_Papp · May 9, 2020, 11:26am

Note that the data types in R are much, much simpler than Julia. Basically vectors (of booleans, integers, floats, complex numbers, characters), with some added metadata as key-value pairs. No user-defined types, no type parameters; not even scalars, just 1-element vectors.

gdkrmr · May 9, 2020, 11:29am

Yes, the documentation of repr says that binary data is shown as UInt8 Vectors.

BioTurboNick · May 9, 2020, 4:06pm

I checked out BSON, but it seems my data is too large for it to handle.

Topic		Replies	Views
Saving and Loading Custom Types General Usage question	1	1223	November 20, 2018
Loading entire jld2 file General Usage	12	2897	July 7, 2020
ANN: JLD2 (JLD in pure Julia) Community	15	3645	October 26, 2017
JLD.jl vs JLD2.jl General Usage	23	8573	October 30, 2018
How to save a large Float32 array on disk using data compression (failed attempt with JLD2)? Data jld2 , data-compression	2	1236	January 24, 2023

Future-proof way to save and load data

Related topics