I have a small function f whose speed I am trying to optimize. Ideally, when the argument to f is a constant or type-inferred, the function should be compiled away. When using @btime I encountered some behavior I don’t understand. Basically, I get drastically different results if I use an interpolated variable instead of the value of that variable.
using BenchmarkTools
# f has previously been defined
x = 123456789 # any integer
@btime f(123456789) # 0.001 ns
@btime f($x) # 300ns
I thought that interpolated variables in @btime expressions were replaced by their values. If so, the two benchmarks should be identical. Furthermore, the output of @code_llvm for both f(123456789) and f(x) is the same, which makes the timing discrepancy even more surprising.
I feel like I must be missing something basic and important here. Can anyone enlighten me?
P.S. – Defining x as const doesn’t change the situation.
Sub-nanosecond (less than one clock cycle) results indicate that your benchmark has been compiled away (constant propagation is your “enemy” here). You can wrap the integer literal into a Ref to avoid constant propagation.
@carstenbauer Yes, but I want it to be compiled away. (Recall that my goal was to design f so that it is often compiled away in practice.) What I don’t understand is why it is not also compiled away when I use variable interpolation.
It sometimes is, see e.g. the example at the bottom of the BenchmarkTools docs here.
For the integer literal the benchmark itself is compiled away (not just the function f). As you noticed, the llvm / native code produced for f(x) is identical to f(123456789). What you’re seeing is just an artefact of the benchmarking.
If you want to make sure it’s compiled away, the best way is to write something like g() = f(123456789) and look at the @code_llvm g() to see if it’s doing anything other than returning a constant.
function g(..., ::Val{x}) where {x}
y = f(x)
# compute a result that depends on y
end
What I want to happen in practice is that when x is inferrable the run-time cost of f within g is as small as possible. My strategy has been to try design f so that in such cases as much of it as possible is compiled away. Given the all above, I don’t know how to reliably benchmark either f or g in this situation. (I get the same sort of discrepancy benchmarking g as I do benchmarking f.)
If you just want to check whether a method call will be “compiled away”, i.e. evaluated at compile-time if its arguments’ values are known at compile-time, looking at compiled code e.g. @code_llvm is way more reliable than timing.
I think you were confused by exactly what “compiled away” means in this context. When you do @btime f(123456789), you’re not benchmarking f(123456789), you’re actually benchmarking temp() = f(123456789). f(123456789) is called in the local scope of a temporary method, so when the method is compiled, the compiler decided to “compile away” the f(123456789) call to a constant value. This decision also depends on what f does, like f(x) = rand(x) could never be compiled away.
When you interpolate @btime f($123456789), you benchmark something like temp(x) = f(x). The top method call temp(123456789) is not compiled away, it uses the compiled code temp(::Int) to take in a runtime integer and calculate stuff at runtime. So it actually represents what happens when f(123456789) is evaluated at runtime.
Honestly I have no idea why @btime is designed that way, it’s a source of a lot of confusion especially because the unorthodox usage of $ in a macro call isn’t even a guarantee of preventing the benchmark from compiling the method call away. Just seems better if none of the arguments are ever assumed constant and we can elect to write our own temporary methods with constant arguments. I’m guessing it’s actually some difficult design problem involving the benchmark loop being run at global scope rather than the macro call scope.
One important thing to realize is that in every language, you should be highly suspicious of any benchmark that reports less than a microsecond or so. Once you get below that level of timing, things like calling convention and inlining start to matter a lot which can vary depending on how the function is called.
If I understand what you are saying, then the situation is something like this?
Case 1: No Interpolation
x = 1234
@btime f(x)
translates roughly to
x = 1234
@time call(f, x) # f(::Int) called with 1234
Case 2: Interpolation
x = 1234
@btime f($x)
translates roughly to
@time call(f, 1234) # f(::Int) called with 1234
Case 3: Literal
x =1234
@btime f(1234)
translates roughly to
temp() = f(1234) # f(1234) compiled to a constant
@time call(temp) # the constant is returned
(end cases)
It is not obvious to me why the 3rd case would be handled differently than the first two, but that would explain what I am seeing. Evidently this is much more complicated than I had supposed – I had supposed that everything was more like case 3, with interpolation causing variables appearing in temp() to be replaced by their values.
I explained case 3 earlier, and I can also explain case 1 is benchmarking something like temp() = f(x). Because x is a non-const global variable that can’t be assumed constant, the method f is evaluated in the runtime loop. However, x’s type is uninferrable too, so you’re also benchmarking the runtime dispatch of f. I amended the a+b example in my earlier comment to show how much time that adds. I suppose it is a fair benchmark if your function call must be runtime dispatched in practice, but it’s more often the case that people wrote their function call to be dispatched at compile-time.
I can’t exactly explain how case 2 works, @btime does some macro trickery to make $ in source code benchmark a function call without runtime dispatch. It’s called “interpolation” but $-interpolation typically does not work in source code, only String literals (" ", """ """) or quoted Expr (:( ), quote end). @eval enables $ in source code with a simpler trick: wrapping the source code in a quote block before it’s put in a Core.eval call expression.