OrderedCollections: Not binary-compatible with prior version?

I think OrderedCollections breaks “binary-compatibility” between the latest version 1.3.3 and the prior version 1.3.2.

Here’s what I mean:

I’m using version 1.3.2 and have model where

typeof(model) == OrderedCollections.OrderedDict{String,Any}

I serialize model to file:

io = open("c:/temp/model.jls", "w")
Serialization.serialize(io, model)
close(io)

Then update to OrderedCollections 1.3.3, restart Julia and try:

model = Serialization.deserialize("c:/temp/model.jls")

this sometimes fails with an error, but more often crashes Julia. :frowning:

So two questions:

  1. Should this be considered a bug? Or would maintaining this kind of compatibility be difficult and a brake on progress? Presumably crashing Julia should never happen, so it’s a bug to that extent…

  2. How can I “upgrade” my file “model.jls” to “model_NEW.jls”, the objective being that the new file can be deserialized with the latest OrderedCollections 1.3.3 to give the same data as the current file. Can Revise help with this? Something like:

# deserialise using OrderedCollections 1.3.2
model = Serialization.deserialize("c:/temp/model.jls")

# Switch to using OrderedCollections 1.3.3 whilst
# keeping model in memory. Is this possible?

# serialise using 1.3.3
io = open("c:/temp/model_NEW.jls", "w")
Serialization.serialize(io, model)
close(io)

It would be a very large obstacle to improvements if a package wasn’t allowed to change its internal types without considering it a breaking change.

Regarding crashing, the documentation string for deserialize is quite clear about it being an unsafe operation:

Read a value written by serialize. deserialize assumes the binary data read from stream
is correct and has been serialized by a compatible implementation of serialize. It has
been designed with simplicity and performance as a goal and does not validate the data
read. Malformed data can result in process termination. The caller has to ensure the
integrity and correctness of data read from stream.

Possibly it could point out that the environment also needs to be identical.

2 Likes

Ah OK, so I should have read the docs.

I wonder whether it would be possible to create (or if there already exist) a safer pair of functions serialize2 and deserialize2. So serialize2 could write some versioning information at the start of the file that deserialize2 could read and raise an error in the event of non-compatibility being detected. Process termination is quite brutal.