Serializing multiple values

The documentation for Serialization is pretty sparse, and I am wondering if I am using it correctly. I will write up what I learn here as a PR for the docs.

I want to serialize multiple objects to a stream from an iterator. Should I construct a Serializer, then write them individually?

Do I need a header if I keep the object in memory (non-persistent)?

The following works:

using Serialization

function writedata(itr)
    io = IOBuffer()
    s = Serializer(io)
    Serialization.writeheader(s)
    for elt in itr
        serialize(s, elt)
    end
    take!(io)
end

function readdata(data)
    io = IOBuffer(data)
    while !eof(io)
        @show deserialize(io)
    end
end
julia> data = writedata((42, "a fish", Float64(π)));

julia> readdata(data)
deserialize(io) = 42
deserialize(io) = "a fish"
deserialize(io) = 3.141592653589793
2 Likes

I usually serialize/deserialize entire tuples containing all objects I want to send or store. This requires no extra machinery. I guess you could collect your iterator and do the same?

No, I specifically don’t want to collect. I am dealing with a large amount of data, obtained online and with an unknown eltype, and I found I can compress the serialized stream using


very efficiently.

Should this also work for serializing custom structs?

I tried with this simple example but get an error when trying to deserialize.

Code as above with this:

struct Foo
    a::Float64
end

itr = [ Foo(i) for i = 1:3 ]
data = writedata(itr)
readdata(data)

Seems to successfully deserialize the first element but errors on the second.

Julia-1.1.0> readdata(data)
deserialize(io) = Foo(1.0)
ERROR: KeyError: key 0 not found
Stacktrace:
 [1] getindex at .\abstractdict.jl:599 [inlined]
 [2] handle_deserialize(::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Int32) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:764
 [3] deserialize(::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:731
 [4] handle_deserialize(::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Int32) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:778
 [5] readdata(::Array{UInt8,1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:731
 [6] top-level scope at none:0

The data was written with the same Serializer object, so there can be shared references within the stream. So the data should be deserialized with the same Serializer too.

Note that deserialize(io::IO) creates a new Serializer object each time, which causes the error above.

readdata should be:

function readdata(data)
    io = IOBuffer(data)
    s = Serializer(io)
    while !eof(io)
        @show deserialize(s)
    end
end

Ref: https://github.com/JuliaLang/julia/issues/31337