Need help understanding allocations while calling external C library (Proj)

Hello everyone,

I have been scratching my head around allocations I have in a package I am developing that I can’t seem to eliminate.

Moreover it seems that the allocation results I get from @allocations and @btime from BenchmarkTools differ.

I have been trying to narrow down some MWE as much as possible to still see unexplicaple (for me) allocations and I came up with a quite reduced code example that you can see below:

using BenchmarkTools
using Proj

latlon1 = (10.0, 100.0,)
latlon2 = (15.0, 140.0,)
dist, azi1, azi2 = (Ref(0.0) for _ in 1:3) |> collect

g = Ref{Proj.geod_geodesic}()
Proj.geod_init(g, 6371e3, 0.0)

# Version with output pointers as kwargs
geod_inverse_kwargs(g, lat1, lon1, lat2, lon2; dist, azi1, azi2) = Proj.geod_inverse(g, lat1, lon1, lat2, lon2, dist, azi1, azi2)

# Call first time to compile
Proj.geod_inverse(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2], dist, azi1, azi2)
Proj.geod_inverse(g, latlon1..., latlon2..., dist, azi1, azi2)
geod_inverse_kwargs(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2]; dist, azi1, azi2)
geod_inverse_kwargs(g, latlon1..., latlon2...; dist, azi1, azi2)

# Compute allocations on standard function, my obtained results as comment after each line
@allocations Proj.geod_inverse(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2], dist, azi1, azi2)
# 4
@btime Proj.geod_inverse($g, $(latlon1[1]), $(latlon1[2]), $(latlon2[1]), $(latlon2[2]), $dist, $azi1, $azi2)
# 547.420 ns (0 allocations: 0 bytes)
@allocations Proj.geod_inverse(g, latlon1..., latlon2..., dist, azi1, azi2)
# 6
@btime Proj.geod_inverse($g, $(latlon1)..., $(latlon2)..., $dist, $azi1, $azi2)
# 550.930 ns (0 allocations: 0 bytes)
@allocations Proj.geod_inverse(g, 10.0, 100.0, 15.0, 140.0, dist, azi1, azi2)
# 0
@btime Proj.geod_inverse($g, 10.0, 100.0, 15.0, 140.0, $dist, $azi1, $azi2)
# 553.086 ns (0 allocations: 0 bytes)

# Compute allocations on kwarg function, my obtained results as comment after each line
@allocations geod_inverse_kwargs(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2]; dist, azi1, azi2)
# 6
@btime geod_inverse_kwargs($g, $(latlon1[1]), $(latlon1[2]), $(latlon2[1]), $(latlon2[2]); dist=$dist, azi1=$azi1, azi2=$azi2)
# 548.511 ns (0 allocations: 0 bytes)
@allocations geod_inverse_kwargs(g, latlon1..., latlon2...; dist, azi1, azi2)
# 8
@btime geod_inverse_kwargs($g, $(latlon1)..., $(latlon2)...; dist=$dist, azi1=$azi1, azi2=$azi2)
# 551.813 ns (0 allocations: 0 bytes)
@allocations geod_inverse_kwargs(g, 10.0, 100.0, 15.0, 140.0; dist, azi1, azi2)
# 2
@btime geod_inverse_kwargs($g, 10.0, 100.0, 15.0, 140.0; dist=$dist, azi1=$azi1, azi2=$azi2)
# 553.021 ns (0 allocations: 0 bytes)

The code above just tries to compute the distance and bearing between two points on earth, with three slightly different ways of calling the internal function.

I also have a version where the pointers to the output are provided as kwargs as that was my original intended use, and I notice that in that case allocations go up by 2 in each call.

The geod_inverse function is quite simple as it seems to just forward the arguments to @ccall:

Can someone help me understand why the allocations are present?

Edit: This is on julia 1.9, and I see this both on Linux and Windows.
Edit2: Added signature with kwargs
Edit3: Added results for @btime

I have also tried using the VSCode allocations profiler and here are the results I get

The call with indexing the tuple seems to be creating allocations due to getindex

@profview_allocs for i in 1:10^5
    Proj.geod_inverse(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2], dist, azi1, azi2) # This produces 4 allocations
end

gives:

The call with splatting also adds allocations due to the GC of the 3 output Ref pluts the Ref containing Proj.geod_geodesic

@profview_allocs for i in 1:10^5
    Proj.geod_inverse(g, latlon1..., latlon2..., dist, azi1, azi2)
end

Finally, the call that does not allocated without kwargs, but does with kwargs seems to allocate due to GC of the output Ref and of an UnkownType

# Profile allocations on last signature with kwarg function
@profview_allocs for i in 1:10^5
    geod_inverse_kwargs(g, 10.0, 100.0, 15.0, 140.0; dist, azi1, azi2)
end

Strangely enough, I can’t seem to see allocations (or significant timing difference) with @btime.
On the Linux server I run stuff into I get the following results (As comment after each line):

@allocations Proj.geod_inverse(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2], dist, azi1, azi2)
# 4
@btime Proj.geod_inverse($g, $(latlon1[1]), $(latlon1[2]), $(latlon2[1]), $(latlon2[2]), $dist, $azi1, $azi2)
# 547.420 ns (0 allocations: 0 bytes)
@allocations Proj.geod_inverse(g, latlon1..., latlon2..., dist, azi1, azi2)
# 6
@btime Proj.geod_inverse($g, $(latlon1)..., $(latlon2)..., $dist, $azi1, $azi2)
# 550.930 ns (0 allocations: 0 bytes)
@allocations Proj.geod_inverse(g, 10.0, 100.0, 15.0, 140.0, dist, azi1, azi2)
# 0
@btime Proj.geod_inverse($g, 10.0, 100.0, 15.0, 140.0, $dist, $azi1, $azi2)
# 553.086 ns (0 allocations: 0 bytes)

# Compute allocations on kwarg function
@allocations geod_inverse_kwargs(g, latlon1[1], latlon1[2], latlon2[1], latlon2[2]; dist, azi1, azi2)
# 6
@btime geod_inverse_kwargs($g, $(latlon1[1]), $(latlon1[2]), $(latlon2[1]), $(latlon2[2]); dist=$dist, azi1=$azi1, azi2=$azi2)
# 548.511 ns (0 allocations: 0 bytes)
@allocations geod_inverse_kwargs(g, latlon1..., latlon2...; dist, azi1, azi2)
# 8
@btime geod_inverse_kwargs($g, $(latlon1)..., $(latlon2)...; dist=$dist, azi1=$azi1, azi2=$azi2)
# 551.813 ns (0 allocations: 0 bytes)
@allocations geod_inverse_kwargs(g, 10.0, 100.0, 15.0, 140.0; dist, azi1, azi2)
# 2
@btime geod_inverse_kwargs($g, 10.0, 100.0, 15.0, 140.0; dist=$dist, azi1=$azi1, azi2=$azi2)
# 553.021 ns (0 allocations: 0 bytes)

As pointed out to me on zulip, it seems the allocations of @allocated in this case are an artifact of executing this in global scope as a script.

I should have catched it but the weird results I was getting in actual more complex use case made me think that was not the issue here.

The lack of allocations with @btime should have made me realize that though.