Unexpected allocation when calling Julia function from C++

Hey, I’m using PackageCompiler to make a shared library and calling it from C++ with the Julia GC turned off, allocating all memory on the C++ side.

From the Julia side, I see 0 allocations:

julia> using BenchmarkTools, StaticArrays

julia> struct State
           phase::MVector{1, Cfloat}

julia> @Base.ccallable function scjulia_step(s::State, in::Cfloat, gain::Cfloat)::Cfloat
           s.phase .+= 0.01f0
           s.phase .%= 1
           ϕ = SVector{1}(s.phase)
           (in + sum(sinpi.(2ϕ))) * gain

julia> state = State(MVector{1}([0f0]))

julia> scjulia_step(state, 0f0, 1f0)

julia> @btime scjulia_step(state, x, y) setup=((x,y)=(rand(Cfloat),rand(Cfloat)))
  16.546 ns (0 allocations: 0 bytes)

But when I use it from C++, it leaks ~50 Bytes/call

extern "C" {
    #include "julia_init.h" //from PackageCompiler

struct SCJuliaState {
    float* phase;

extern "C" float scjulia_step(SCJuliaState, float, float);

int main (int argc, char** argv){
    int argc_ = 1;
    const char* arg = "";
    char** argv_ = const_cast<char**>(&arg);
    init_julia(argc_, argv_);

    float* phase = (float*)malloc(sizeof(float));
    *phase = 0.0f;
    struct SCJuliaState state = {phase};

    for (int i=0; i<100000000; i++)
        cout << scjulia_step(state, 0.5, 0.5) << "\n";

    return  0;
// memory use in activity monitor shoots up to > 5GB

what’s going on here? am i using @btime wrong? is something weird happening with passing the struct this way?

macOS 10.14.6 (18G9323)

$ c++ --version
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

locally built Julia from the v1.7.1 tag

BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"