Is it possible to predict the serialized size of an object, without serializing it?

Hello all!
I am pretty much brand new to Julia, and I am trying to write an interface between Julia and a C library that uses MPI. The problem I’m having is that the library needs to be able to pack Julia objects into a buffer, then unpack them on the other end, but the library expects the user to provide three functions,

  1. tell the library the size of the object, in bytes, and takes no arguments
    bufsize()
  2. pack the object into a buffer of given size (allocated by the library)
    packbuff(obj::Ptr{Cvoid}, buffer::Ptr{Cvoid})
  3. unpack an object from a buffer of given size and store a reference in obj_ptr.
    unpackbuff(obj_ptr::Ptr{Ptr{Cvoid}}, buffer::Ptr{Cvoid})
    The library calls (1.) then allocates a void buffer of that size. Then the buffer and a pointer to a Julia reference are passed to (2.), then the buffer is sent via MPI to another processor, and finally the buffer as well as a NULL pointer are passed to (3.), which unpacks the buffer into a Julia object, and stores a reference in obj_ptr.

The frustrating part is, during the call to bufsize, I don’t have access to the object being packed, and I have a globally scoped IdDict which stores references to the Julia objects, so they aren’t deleted by garbage collection, but I can’t possibly know which object is going to be passed. The expectation is that the objects will all have the same size in memory, or a maximum possible size should be passed.
I have a MWE here:

// my_lib.c
#include <stdlib.h>
#include <stdio.h>
#include "lib.h"

int lib_main(lib_vec u, lib_vec v, lib_ptf_pack pack, lib_ptf_unpack unpack, lib_ptf_buffsize buffsize)
{
    int size = 0;
    buffsize(&size);
    void *buffer = malloc(size);
    pack(u, buffer);
    unpack(v, buffer); // u should now equal v!
    free(buffer);
    return 0;
}
// my_lib.h
#ifndef _MYLIB_H_
#define _MYLIB_H_

// the real library is written like this, where lib_vec_struct simply wraps the user's data structure
struct lib_vec_struct;
typedef struct lib_vec_struct *lib_vec;

typedef int (*lib_ptf_pack)(lib_vec vec, void* buffer);

typedef int (*lib_ptf_unpack)(lib_vec vec, void* buffer);

typedef int (*lib_ptf_buffsize)(int *size_ptr);

extern int lib_main(lib_vec u, lib_vec v, lib_ptf_pack pack, lib_ptf_unpack unpack, lib_ptf_buffsize buffsize);

#endif
# juliaInterface.jl
using Serialization: serialize, deserialize

mutable struct my_vec
    data
end

v = my_vec([1, 2, 3, 4])
u = my_vec(zeros(4))
# Here, I'm having to serialize an object just to find out
# what its serialized size is
buffer = IOBuffer()
serialize(buffer, v)
bufsize = buffer.size

function my_pack!(vec, buffer::Ptr{Cvoid})::Cint
    println("pack")
    vec_ref = unsafe_pointer_to_objref(vec)
    data_arr = unsafe_wrap(Vector{UInt8}, Base.unsafe_convert(Ptr{UInt8}, buffer), bufsize)
    buff = IOBuffer(data_arr, write=true, maxsize=bufsize)
    serialize(buff, vec_ref)
    return 0
end

function my_unpack!(vec, buffer::Ptr{Cvoid})::Cint
    println("unpack")
    vec_ref = unsafe_pointer_to_objref(vec)
    data_arr = unsafe_wrap(Vector{UInt8}, Base.unsafe_convert(Ptr{UInt8}, buffer), bufsize)
    buff = IOBuffer(data_arr, read=true, maxsize=bufsize)
    show(buff)
    # unpack the buffer into the julia struct, then shallow copy the struct
    vec2_ref = deserialize(buff)
    vec_ref.data .= vec2_ref.data

    return 0
end

function my_buffsize!(size::Ptr{Cint})::Cint
    println("size")
    unsafe_store!(size, bufsize)
    return 0
end

pack_c = @cfunction(my_pack!, Cint, (Ptr{Cvoid}, Ptr{Cvoid}))
unpack_c = @cfunction(my_unpack!, Cint, (Ptr{Cvoid}, Ptr{Cvoid}))
buffsize_c = @cfunction(my_buffsize!, Cint, (Ptr{Cint},))

ccall((:lib_main, "./mylib.so"), Cint, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cvoid}),
      pointer_from_objref(u), pointer_from_objref(v), pack_c, unpack_c, buffsize_c)
# now u and v should both be filled with zeros

I am just wondering if there is a consistent way to predict what the serialized size needs to be without having to initialize an example, serialize it, then store that size at the global scope. The documentation for Serialization.serialize mentions that an 8 byte header is written, then the data is written, so you should be able to predict this, but from testing, the difference between the size I’m expecting and the actual size of the buffer varies. I’m really stumped on this.

I can only speculate but I think the upper bound is Base.summarysize()

Yes! This works! I just need to use Base.summarysize(obj) + 8 then. Thank you so much.

Actually, I ran into a case where this doesn’t work:

julia> using Serialization: serialize

julia> mutable struct foo
       bar
       end

julia> x = foo(1.)
foo(1.0)

julia> buffer = IOBuffer(); serialize(buffer, x)

julia> buffer.size
33

julia> Base.summarysize(x) + 8
24

What’s going on?

summarysize is an estimate of the amount of space the object takes in memory, which is not always the same as its size when serialised.

I think the only way to reliably get the serialised size is to serialise it. To avoid using memory or disk, you could write your own IO type similar to devnull which drops anything written but keeps track of the number of bytes written.

1 Like

Ooh, thank you for this. That’s a great idea!