Performance of function with pre-allocated outputs

f_alloc is not a const.

julia> @btime $f_alloc(1.0, 2.0);
  21.310 ns (0 allocations: 0 bytes)

fixes the problem for me.

If you intend to call f_alloc from other functions without passing it around as an input, you’ll need to make it a const.

const f_alloc = test_allocations(ntest)

As for the code, the main “issue” is that f_test! did not inline. You could mark it @inline.
Alternatively, just use Cthulhu.jl to descend into any functions of interest when inspecting code.