Where does this allocation pattern come from?

When I execute the following code, I get some interesting pattern:

x = collect(1:1000)
x[1]
@showtime x[511]
@showtime x[511]
@showtime x[511]
@showtime x[512]
@showtime x[512]
@showtime x[512]
@showtime x[511]
@showtime x[512]
x[511]: 0.000005 seconds
x[511]: 0.000001 seconds
x[511]: 0.000001 seconds
x[512]: 0.000001 seconds (1 allocation: 16 bytes)
x[512]: 0.000001 seconds (1 allocation: 16 bytes)
x[512]: 0.000001 seconds (1 allocation: 16 bytes)
x[511]: 0.000001 seconds
x[512]: 0.000001 seconds (1 allocation: 16 bytes)

Namely x[511] does not allocate, while x[512] does.
Why is this the case, what is going on under the hood?

PS: I am aware of the performance tips and everything becomes fast if put into a function.

1 Like

I looked into and it calls sprintf so I thought it causing it, and seems confirmed but even stranger (how are 5+ allocations swallowed if that’s done implicitly?):

julia> @time Base.sprint(Base.show_unquoted, x[512])
  0.000024 seconds (7 allocations: 280 bytes)
"512"

julia> @time Base.sprint(Base.show_unquoted, x[511])
  0.000022 seconds (5 allocations: 248 bytes)
"511"

It does implicitly:

julia> @time s = IOBuffer(sizehint=0)
  0.000004 seconds (2 allocations: 128 bytes)

but I’m guessing in some cases the compiler is clever and can optimize it out?

and then for me:

julia> @time Base.show_unquoted(s, x[512])
  0.000021 seconds (3 allocations: 104 bytes)

note, if repeated, not consistent, but seemed to be in your case:

julia> @time Base.show_unquoted(s, x[512])
  0.000017 seconds (4 allocations: 168 bytes)

julia> @time Base.show_unquoted(s, x[512])
  0.000016 seconds (3 allocations: 104 bytes)

In full:

function sprint(f::Function, args...; context=nothing, sizehint::Integer=0)
    s = IOBuffer(sizehint=sizehint)
    if context isa Tuple
        f(IOContext(s, context...), args...)
    elseif context !== nothing
        f(IOContext(s, context), args...)
    else
        f(s, args...)
    end
    String(_unsafe_take!(s))
end

According to @macroexpand the printing happens after performance stats are taken, so we should not see any print related allocs? Here is a benchmark where it is more clear, that printing should not affect the benchmark:

x = collect(1:1000)
x[1]
a_511_1 = @allocated x[511]
a_511_2 = @allocated x[511]
a_511_3 = @allocated x[511]
a_512_1 = @allocated x[512]
a_512_2 = @allocated x[512]
a_512_3 = @allocated x[512]
a_511_4 = @allocated x[511]
a_512_4 = @allocated x[512]
@show a_511_1
@show a_511_2
@show a_511_3
@show a_512_1
@show a_512_2
@show a_512_3
@show a_511_4
@show a_512_4

# a_511_1 = 0
# a_511_2 = 0
# a_511_3 = 0
# a_512_1 = 16
# a_512_2 = 16
# a_512_3 = 16
# a_511_4 = 0
# a_512_4 = 16

I do not know what the source of allocations is but defining the global variable x as const should remove the allocations.