Surprising deserialization

This is more of a heads up for people using de- and serialize from Serialization than anything else. It’s very clear that serialization isn’t meant to be used this way (see here), but thought it would be worth posting:

using Dates, Serialization
struct A
    v::Nanosecond
end
a1 = A(Nanosecond(1))
serialize("tmp.dat", a1)
a2 = deserialize("tmp.dat") # A(1 nanosecond)

# new session

using Dates, Serialization
struct A
    v::Millisecond
end
a3 = deserialize("tmp.dat") # A(1 millisecond)

Similarly, if a package is updated and they change the layout of a struct you have saved, you’re in for some wacky behavior.

I am about to update my own package in that exact manner… Thankfully I have two users (one is me), so handling this should be straightforward… But yea, that’s how I discovered this. Thank god I tested stuff first.

It is well-documented: ?Serialization.serialize tells you that

In general, this process will not work if the reading and writing are done by different versions of Julia, or an instance of Julia with a different system image.

Serialization is mostly for communication within the same program (different threads). For everything else, use HDF5, JLD2, BSON, etc.

That docstring doesn’t point out that serializing across different Julia sessions with the same sysimage might have unexpected results. And if you don’t know how serialize stores data, then it’s understandable that you might get behavior you didn’t expect, as in this example.

2 Likes

Good point. Perhaps a different wording, eg “runtime image”, would be helpful.

I tend to think of the “image” as the current state of the runtime process, but that’s probably a habit brought over from Common Lisp.

1 Like

I think one way of looking at this is that the buffers saved and read by serialize and deserialize are “trusted”, i.e. it gets some efficiency by not validating every aspect of the data it stored. This makes sense because Serialization, if I understand it correctly, was primarily designed for IPC between Julia processes in a cluster. If it checked these sorts of things it would probably be much too slow for that.