I’ve been told I’m not using @btime correctly. Can someone please explain how to profile the allocations of this function without allocating the output? If I avoid returning the output or wrap it in another function, the whole computation gets compiled away.
julia> @btime foo($vals) 754.432 ns (0 allocations: 0 bytes)
For an explanation of the use of
$ here, check out the BenchmarkTools manual.
Thanks. So the cause of the allocations has nothing to do with the output (which is just an Int64 - should be on the stack anyway), but is to do with vals being a non-constant global (which requires some type checking or dispatch or something, that is being measured)?
Int64 is not always on the stack. In fact, it is almost never on the stack. Mostly either not existing, in the register, or on the heap. The allocation of the result stilll IS the issue and it happens every time the result can’t be inferred including when it needs to be stored in a global.
Ah, that makes sense, thank you.
I’m still curious to understand why the allocating version does or does not allocate depending on the contents of the vector. It seems like for a vector of ones, if I set more than the first 511 values to NaN, it allocates.
The boxes (heap allocated) values for small integers are cached.