Sounds right. Either @code_native
xor @btime
is inaccurate, and roughly @time
ing a non-hoisting loop with no runtime dispatch seems to corroborate @code_native
here: 0 allocations even over 1e8 iterations.
julia> foo(Float64,10), foo(10) # compile first
(22026.465794806718, 22026.465794806718)
julia> @time for i in 1:100_000_000
foo(Float64, i) # non-constant local i prevents hoist
end
0.852338 seconds
julia> @time for i in 1:100_000_000
foo(i) # non-constant local i prevents hoist
end
0.848491 seconds
Weirdly, in the other thread I linked earlier, this approach corroborated the @btime
difference instead of the matching @code_native
/@code_llvm
. So maybe there should be issues in both base Julia and BenchmarkTools to figure out what is going on.