I’m working on Mongoc.jl and I would like to make BSON documents serializable between julia workers.
mutable struct BSON
handle::Ptr{Cvoid}
end
Since Mongoc.BSON is just a wrapper for a C handle, if I just fetch the BSON from a worker, it will just copy the handle pointer address and point to an invalid location on the master process.
So I created a wrapper around BSON that converts it to a buffer as a Vector{UInt8}.
struct BufferedBSON
bson_data::Vector{UInt8}
end
function BufferedBSON(bson::BSON)
io = IOBuffer()
write_bson(io, bson)
return BufferedBSON(take!(io))
end
BSON(buff::BufferedBSON) :: BSON = read_bson(buff.bson_data)[1]
So that the following code works just fine:
addprocs(1)
@everywhere using Mongoc
let
f = @spawn Mongoc.BufferedBSON(Mongoc.BSON("a" => 1))
bson = Mongoc.BSON(fetch(f))
@test bson["a"] == 1
end
But, I would like to add a syntax sugar so that the user uses only BSON, as in:
f = @spawn Mongoc.BSON("a" => 1)
bson = fetch(f)
Is there a way to customize the serialization primitives to hook Mongoc.BufferedBSON as a transport for serializing BSON in this setup?
It is possible, but appears to be under documented and currently requires dipping into the implementation details a bit. In particular, you need to be aware of how the type information is written, and do the same in your own serialize function.
Here’s a demo of how to do it:
# Simplest possible example of a struct managing externally allocated data
mutable struct MyType
p::Ptr{Int}
end
function MyType(i::Integer)
p = reinterpret(Ptr{Int}, ccall(:malloc, Ptr{Cvoid}, (Csize_t,), sizeof(Int)))
unsafe_store!(p, i)
t = MyType(p)
finalizer((m)->ccall(:free, Cvoid, (Ptr{Cvoid},), m.p), t)
t
end
Now the serialization parts
using Serialization
function Serialization.serialize(s::AbstractSerializer, m::MyType)
Serialization.serialize_type(s, typeof(m)) # Implementation detail of Serialization module.
# Next we are free to serialize arbitrary binary data
write(s.io, unsafe_load(m.p))
end
function Serialization.deserialize(s::AbstractSerializer, ::Type{MyType})
# Read binary data from `s`
val = read(s.io, Int)
MyType(val)
end
As a test:
m = MyType(1234)
buf = IOBuffer()
serialize(buf, m)
seek(buf,0)
m2 = deserialize(buf)
@show m m2
@show unsafe_load(m.p) unsafe_load(m2.p)
which shows that m and m2 now hold the same content but at different addresses.
function Serialization.serialize(s::AbstractSerializer, bson::BSON)
Serialization.serialize_type(s, BSON)
Serialization.serialize(s.io, BufferedBSON(bson))
end
This works but you’re writing more bytes than necessary. It’s more efficient to just use binary IO after having written the type info:
function Serialization.serialize(s::AbstractSerializer, bson::BSON)
Serialization.serialize_type(s, BSON)
write_bson(s.io, bson)
end
function Serialization.deserialize(s::AbstractSerializer, ::Type{BSON})
read_bson(s.io) # I assume you have a read_bson returning BSON?
end
This way you also don’t need the BufferedBson type at all.