I’m still trying to understand when Julia code triggers allocations. Is the example below really allocating? If so, why? Or is the measurement incorrect?
julia> a = [1.0];
julia> function f(x)
x[1]
end
f (generic function with 1 method)
julia> @btime f(a)
14.635 ns (1 allocation: 16 bytes)
1.0
julia> @allocated f(a)
16
julia> @code_native f(a)
.text
; ┌ @ REPL[13]:2 within `f'
; │┌ @ REPL[13]:2 within `getindex'
movq (%rdi), %rax
vmovsd (%rax), %xmm0 # xmm0 = mem[0],zero
; │└
retq
nopl (%rax,%rax)
; â””
The function returns a value which must be allocated: the returned value is 8 bytes and a type tag for it is 8 bytes. However if this is used in a context where the value doesn’t have to be returned or its use can be inlined then no allocation needs to happen.
Yes, that’s an excellent riddle. Seems to have something to do with the const .
The moral of the story is: @allocated does not lie, it reports what Julia actually allocates; what Julia actually does may be trickier than you think, but it’s not worth sweating a few tens of bytes here and there unless you want to go down a rabbit hole.
This is entirely to do with how BenchmarkTools treats expressions and global variables.
When you don’t interpolate and just ask for @btime f(a), then BenchmarkTools is measuring the performance as though you wrote f(a) directly inside some function. Note, though that a is a global and it’s not a constant — so this is a type instability! When you flag a by interpolating it with a $, then BenchmarkTools treats it as though it were an argument to that function. It becomes a type-stable local variable in the benchmarking loop.
So then you can see the extra optimization we have for small integers in such a type-unstable case. It doesn’t show up in Kristoffer’s experiment above because he made his global a const (so it’s no longer type-unstable) and tested it with @allocated, which works differently and wouldn’t show a type-instability in the arguments.