Extra allocation with `T::DataType`?

Sounds right. Either @code_native xor @btime is inaccurate, and roughly @timeing a non-hoisting loop with no runtime dispatch seems to corroborate @code_native here: 0 allocations even over 1e8 iterations.

julia> foo(Float64,10), foo(10) # compile first
(22026.465794806718, 22026.465794806718)

julia> @time for i in 1:100_000_000
         foo(Float64, i) # non-constant local i prevents hoist
       end
  0.852338 seconds

julia> @time for i in 1:100_000_000
         foo(i) # non-constant local i prevents hoist
       end
  0.848491 seconds

Weirdly, in the other thread I linked earlier, this approach corroborated the @btime difference instead of the matching @code_native/@code_llvm. So maybe there should be issues in both base Julia and BenchmarkTools to figure out what is going on.