Create `IOBuffer` from `Ptr{UInt8}` (in a `Base.@ccallable` function)

I have to generate a shared library that uses protocol buffers for receiving and sending data. So, basically a function such as

@Base.ccallable function protobuf_IO(pb_in::Ptr{UInt8})::Ptr{UInt8}
    ...
end

that then is AOT-compiled into a shared library, e.g. using juliac.

Obviously, I will use ProtoBuf.jl ( Home · ProtoBuf.jl) for this. The encoding part, i.e. how to return encoded data, should be no problem. Will be something along the lines of:

io = IOBuffer()
e = ProtoEncoder(io)
encode(e, <MessageInstance>)
return pointer(e.io.data)

But what about the other way 'round? The nice thing about protobuf is that I would not need to pass the information about the length of the binary data (byte array) along, since it is all encoded in the wire-transfer format. [EDIT: That’s not the case :see_no_evil_monkey:, see my follow-up post below.] But, the only suitable constructor I can see for IOBuffer is the one requiring an actual UInt8[...] array, not Ptr{UInt8}. To create the array, I’d need to know the length of the data, then unsafe_load each value. Also, I don’t want to copy the entire data, which might be quite heavy.

The example (and docs) just illustrate the case when there already is an io::IOBuffer, namely:

seekstart(io) # if buffer not at start yet
d = ProtoDecoder(io)
decode(d, <MessageType>)

Thanks for any help; I’m new to all that IOBuffer-related stuff and am probably just missing something here :wink: .

Ok, found “it” out myself. I was fundamentally wrong about protobufs; I thought to remember the it is encoded in the binary wire format itself when then message ends. But that’s not the case; it is a pure message-serialisation format, no transport protocol. So, one has to “frame” the messages externally. Common methods:

  • Prefix in the binary data itself; e.g. prefix 4 bytes with the total length of the following message. Most common, done in gRPC, for example.
  • Separate piece of information (such as a second integer argument to a function in a shared library).
  • Terminating delimiter; not really possible in protobufs.

But still: if I’d use a message serialisation format that could, from the information contained in the message itself, decide when the message is terminated… could I construct an IOBuffer without knowing the length of the data in advance? There’s not much discussion on IO and what the (minimal) interface is…

Relying on the encoding to read from a raw pointer, without separately knowing the length of the pointed-to-data, seems very unsafe — if you have mis-encoded data then you will be vulnerable to buffer-overrun crashes (and security exploits).