Customize serialization on fetch from worker

I’m working on Mongoc.jl and I would like to make BSON documents serializable between julia workers.

mutable struct BSON
    handle::Ptr{Cvoid}
end

Since Mongoc.BSON is just a wrapper for a C handle, if I just fetch the BSON from a worker, it will just copy the handle pointer address and point to an invalid location on the master process.

So I created a wrapper around BSON that converts it to a buffer as a Vector{UInt8}.

struct BufferedBSON
    bson_data::Vector{UInt8}
end

function BufferedBSON(bson::BSON)
    io = IOBuffer()
    write_bson(io, bson)
    return BufferedBSON(take!(io))
end

BSON(buff::BufferedBSON) :: BSON = read_bson(buff.bson_data)[1]

So that the following code works just fine:

addprocs(1)
@everywhere using Mongoc

let
    f = @spawn Mongoc.BufferedBSON(Mongoc.BSON("a" => 1))
    bson = Mongoc.BSON(fetch(f))
    @test bson["a"] == 1
end

But, I would like to add a syntax sugar so that the user uses only BSON, as in:

f = @spawn Mongoc.BSON("a" => 1)
bson = fetch(f)

Is there a way to customize the serialization primitives to hook Mongoc.BufferedBSON as a transport for serializing BSON in this setup?

I think you want to provide a definition for Serialization.serialize(s::AbstractSerializer, b::BSON), and similar for Serialization.deserialize()?

By looking at Serialization code, looks like it cannot be extended. It also looks like the solution is to implement read(io, bson), write(io, bson) ?

It is possible, but appears to be under documented and currently requires dipping into the implementation details a bit. In particular, you need to be aware of how the type information is written, and do the same in your own serialize function.

Here’s a demo of how to do it:

# Simplest possible example of a struct managing externally allocated data
mutable struct MyType
    p::Ptr{Int}
end

function MyType(i::Integer)
    p = reinterpret(Ptr{Int}, ccall(:malloc, Ptr{Cvoid}, (Csize_t,), sizeof(Int)))
    unsafe_store!(p, i)
    t = MyType(p)
    finalizer((m)->ccall(:free, Cvoid, (Ptr{Cvoid},), m.p), t)
    t
end

Now the serialization parts

using Serialization

function Serialization.serialize(s::AbstractSerializer, m::MyType)
    Serialization.serialize_type(s, typeof(m))  # Implementation detail of Serialization module.
    # Next we are free to serialize arbitrary binary data
    write(s.io, unsafe_load(m.p))
end

function Serialization.deserialize(s::AbstractSerializer, ::Type{MyType})
    # Read binary data from `s`
    val = read(s.io, Int)
    MyType(val)
end

As a test:

m = MyType(1234)

buf = IOBuffer()
serialize(buf, m)
seek(buf,0)

m2 = deserialize(buf)

@show m m2
@show unsafe_load(m.p) unsafe_load(m2.p)

which shows that m and m2 now hold the same content but at different addresses.

m = MyType(Ptr{Int64} @0x000056331a1915a0)
m2 = MyType(Ptr{Int64} @0x00005633196ee530)
unsafe_load(m.p) = 1234
unsafe_load(m2.p) = 1234
2 Likes

Thanks @c42f! I just figured it out.

The complete solution was this:

struct BufferedBSON
    bson_data::Vector{UInt8}
end

function BufferedBSON(bson::BSON)
    io = IOBuffer()
    write_bson(io, bson)
    return BufferedBSON(take!(io))
end

BSON(buff::BufferedBSON) :: BSON = read_bson(buff.bson_data)[1]

function Serialization.serialize(s::AbstractSerializer, bson::BSON)
    Serialization.serialize_type(s, BSON)
    Serialization.serialize(s.io, BufferedBSON(bson))
end

function Serialization.deserialize(s::AbstractSerializer, ::Type{BSON})
    BSON(Serialization.deserialize(s.io))
end

Base.write(io::IO, bson::BSON) = serialize(io, bson)
Base.read(io::IO, ::Type{BSON}) = deserialize(io)::BSON
function Serialization.serialize(s::AbstractSerializer, bson::BSON)
    Serialization.serialize_type(s, BSON)
    Serialization.serialize(s.io, BufferedBSON(bson))
end

This works but you’re writing more bytes than necessary. It’s more efficient to just use binary IO after having written the type info:

function Serialization.serialize(s::AbstractSerializer, bson::BSON)
    Serialization.serialize_type(s, BSON)
    write_bson(s.io, bson)
end
function Serialization.deserialize(s::AbstractSerializer, ::Type{BSON})
    read_bson(s.io)  # I assume you have a read_bson returning BSON?
end

This way you also don’t need the BufferedBson type at all.

Yes, I also noticed this. I’ll refactor the code to do just that. Thanks a lot!