Is simply accessing an array element really allocating? (Solved)

I’m still trying to understand when Julia code triggers allocations. Is the example below really allocating? If so, why? Or is the measurement incorrect?

julia> a = [1.0];

julia> function f(x)
       x[1]
       end
f (generic function with 1 method)

julia> @btime f(a)
  14.635 ns (1 allocation: 16 bytes)
1.0

julia> @allocated f(a)
16

julia> @code_native f(a)
        .text
; ┌ @ REPL[13]:2 within `f'
; │┌ @ REPL[13]:2 within `getindex'
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0           # xmm0 = mem[0],zero
; │└
        retq
        nopl    (%rax,%rax)
; â””

The function returns a value which must be allocated: the returned value is 8 bytes and a type tag for it is 8 bytes. However if this is used in a context where the value doesn’t have to be returned or its use can be inlined then no allocation needs to happen.

2 Likes

If you interpolate the global variable a into the benchmarking expression, @btime shows no allocation:

julia> @btime f($a)
  1.495 ns (0 allocations: 0 bytes)
1.0
3 Likes

So why doesn’t every function that returns a value report an allocation when measured in the REPL?

For example, if I change the example above from Vector{Float64} to Vector{Int64}, then it reports zero allocations.

Ahh… that’s right. I forgot about interpolation. Thanks!

There’s a cache of small integer objects. If you return a larger integer value you’ll see that allocation is required again.

1 Like

This returns a float though? And also

julia> const a = [100000000000]
1-element Array{Int64,1}:
 100000000000

julia> function f(x)
           x[1]
       end;

julia> @allocated f(a)
0

Yes, that’s an excellent riddle. Seems to have something to do with the const :smile:.

The moral of the story is: @allocated does not lie, it reports what Julia actually allocates; what Julia actually does may be trickier than you think, but it’s not worth sweating a few tens of bytes here and there unless you want to go down a rabbit hole.

This is entirely to do with how BenchmarkTools treats expressions and global variables.

When you don’t interpolate and just ask for @btime f(a), then BenchmarkTools is measuring the performance as though you wrote f(a) directly inside some function. Note, though that a is a global and it’s not a constant — so this is a type instability! When you flag a by interpolating it with a $, then BenchmarkTools treats it as though it were an argument to that function. It becomes a type-stable local variable in the benchmarking loop.

So then you can see the extra optimization we have for small integers in such a type-unstable case. It doesn’t show up in Kristoffer’s experiment above because he made his global a const (so it’s no longer type-unstable) and tested it with @allocated, which works differently and wouldn’t show a type-instability in the arguments.

julia> a = [1.0]
1-element Array{Float64,1}:
 1.0

julia> @btime f(a)
  35.767 ns (1 allocation: 16 bytes)
1.0

julia> @btime f($a)
  2.077 ns (0 allocations: 0 bytes)
1.0

julia> a = [1]
1-element Array{Int64,1}:
 1

julia> @btime f(a)
  26.699 ns (0 allocations: 0 bytes)
1

julia> @btime f($a)
  2.077 ns (0 allocations: 0 bytes)
1

julia> a = [1000000]
1-element Array{Int64,1}:
 1000000

julia> @btime f(a)
  36.426 ns (1 allocation: 16 bytes)
1000000

julia> @btime f($a)
  2.077 ns (0 allocations: 0 bytes)
1000000
6 Likes