Is simply accessing an array element really allocating? (Solved)


#1

I’m still trying to understand when Julia code triggers allocations. Is the example below really allocating? If so, why? Or is the measurement incorrect?

julia> a = [1.0];

julia> function f(x)
       x[1]
       end
f (generic function with 1 method)

julia> @btime f(a)
  14.635 ns (1 allocation: 16 bytes)
1.0

julia> @allocated f(a)
16

julia> @code_native f(a)
        .text
; ┌ @ REPL[13]:2 within `f'
; │┌ @ REPL[13]:2 within `getindex'
        movq    (%rdi), %rax
        vmovsd  (%rax), %xmm0           # xmm0 = mem[0],zero
; │└
        retq
        nopl    (%rax,%rax)
; â””

#2

The function returns a value which must be allocated: the returned value is 8 bytes and a type tag for it is 8 bytes. However if this is used in a context where the value doesn’t have to be returned or its use can be inlined then no allocation needs to happen.


#3

If you interpolate the global variable a into the benchmarking expression, @btime shows no allocation:

julia> @btime f($a)
  1.495 ns (0 allocations: 0 bytes)
1.0

#4

So why doesn’t every function that returns a value report an allocation when measured in the REPL?

For example, if I change the example above from Vector{Float64} to Vector{Int64}, then it reports zero allocations.


#5

Ahh… that’s right. I forgot about interpolation. Thanks!


#6

There’s a cache of small integer objects. If you return a larger integer value you’ll see that allocation is required again.


#7

This returns a float though? And also

julia> const a = [100000000000]
1-element Array{Int64,1}:
 100000000000

julia> function f(x)
           x[1]
       end;

julia> @allocated f(a)
0

#8

Yes, that’s an excellent riddle. Seems to have something to do with the const :smile:.

The moral of the story is: @allocated does not lie, it reports what Julia actually allocates; what Julia actually does may be trickier than you think, but it’s not worth sweating a few tens of bytes here and there unless you want to go down a rabbit hole.


#9

This is entirely to do with how BenchmarkTools treats expressions and global variables.

When you don’t interpolate and just ask for @btime f(a), then BenchmarkTools is measuring the performance as though you wrote f(a) directly inside some function. Note, though that a is a global and it’s not a constant — so this is a type instability! When you flag a by interpolating it with a $, then BenchmarkTools treats it as though it were an argument to that function. It becomes a type-stable local variable in the benchmarking loop.

So then you can see the extra optimization we have for small integers in such a type-unstable case. It doesn’t show up in Kristoffer’s experiment above because he made his global a const (so it’s no longer type-unstable) and tested it with @allocated, which works differently and wouldn’t show a type-instability in the arguments.

julia> a = [1.0]
1-element Array{Float64,1}:
 1.0

julia> @btime f(a)
  35.767 ns (1 allocation: 16 bytes)
1.0

julia> @btime f($a)
  2.077 ns (0 allocations: 0 bytes)
1.0

julia> a = [1]
1-element Array{Int64,1}:
 1

julia> @btime f(a)
  26.699 ns (0 allocations: 0 bytes)
1

julia> @btime f($a)
  2.077 ns (0 allocations: 0 bytes)
1

julia> a = [1000000]
1-element Array{Int64,1}:
 1000000

julia> @btime f(a)
  36.426 ns (1 allocation: 16 bytes)
1000000

julia> @btime f($a)
  2.077 ns (0 allocations: 0 bytes)
1000000