The documentation for Serialization is pretty sparse, and I am wondering if I am using it correctly. I will write up what I learn here as a PR for the docs.
I want to serialize multiple objects to a stream from an iterator. Should I construct a Serializer
, then write them individually?
Do I need a header if I keep the object in memory (non-persistent)?
The following works:
using Serialization
function writedata(itr)
io = IOBuffer()
s = Serializer(io)
Serialization.writeheader(s)
for elt in itr
serialize(s, elt)
end
take!(io)
end
function readdata(data)
io = IOBuffer(data)
while !eof(io)
@show deserialize(io)
end
end
julia> data = writedata((42, "a fish", Float64(π)));
julia> readdata(data)
deserialize(io) = 42
deserialize(io) = "a fish"
deserialize(io) = 3.141592653589793
3 Likes
I usually serialize/deserialize entire tuples containing all objects I want to send or store. This requires no extra machinery. I guess you could collect
your iterator and do the same?
No, I specifically don’t want to collect. I am dealing with a large amount of data, obtained online and with an unknown eltype, and I found I can compress the serialized stream using
very efficiently.
1 Like
Should this also work for serializing custom structs?
I tried with this simple example but get an error when trying to deserialize
.
Code as above with this:
struct Foo
a::Float64
end
itr = [ Foo(i) for i = 1:3 ]
data = writedata(itr)
readdata(data)
Seems to successfully deserialize
the first element but errors on the second.
Julia-1.1.0> readdata(data)
deserialize(io) = Foo(1.0)
ERROR: KeyError: key 0 not found
Stacktrace:
[1] getindex at .\abstractdict.jl:599 [inlined]
[2] handle_deserialize(::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Int32) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:764
[3] deserialize(::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:731
[4] handle_deserialize(::Serializer{Base.GenericIOBuffer{Array{UInt8,1}}}, ::Int32) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:778
[5] readdata(::Array{UInt8,1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Serialization\src\Serialization.jl:731
[6] top-level scope at none:0
The data was written with the same Serializer
object, so there can be shared references within the stream. So the data should be deserialized with the same Serializer
too.
Note that deserialize(io::IO)
creates a new Serializer
object each time, which causes the error above.
readdata
should be:
function readdata(data)
io = IOBuffer(data)
s = Serializer(io)
while !eof(io)
@show deserialize(s)
end
end
Ref: Error deserializing multiple values of custom struct · Issue #31337 · JuliaLang/julia · GitHub
1 Like