Confusing behavior of btime

benninkrs · June 13, 2022, 5:30pm

I have a small function f whose speed I am trying to optimize. Ideally, when the argument to f is a constant or type-inferred, the function should be compiled away. When using @btime I encountered some behavior I don’t understand. Basically, I get drastically different results if I use an interpolated variable instead of the value of that variable.

using BenchmarkTools
# f has previously been defined

x = 123456789          # any  integer
@btime f(123456789)    # 0.001 ns
@btime f($x)           # 300ns

I thought that interpolated variables in @btime expressions were replaced by their values. If so, the two benchmarks should be identical. Furthermore, the output of @code_llvm for both f(123456789) and f(x) is the same, which makes the timing discrepancy even more surprising.

I feel like I must be missing something basic and important here. Can anyone enlighten me?

P.S. – Defining x as const doesn’t change the situation.

carstenbauer · June 13, 2022, 5:48pm

Sub-nanosecond (less than one clock cycle) results indicate that your benchmark has been compiled away (constant propagation is your “enemy” here). You can wrap the integer literal into a Ref to avoid constant propagation.

julia> f(x) = x^2
f (generic function with 1 method)

julia> @btime f(123456789);
  0.043 ns (0 allocations: 0 bytes)

julia> @btime f(Ref(123456789)[]);
  1.513 ns (0 allocations: 0 bytes)

julia> x = 123456789;

julia> @btime f($x);
  1.514 ns (0 allocations: 0 bytes)

benninkrs · June 13, 2022, 5:52pm

@carstenbauer Yes, but I want it to be compiled away. (Recall that my goal was to design f so that it is often compiled away in practice.) What I don’t understand is why it is not also compiled away when I use variable interpolation.

carstenbauer · June 13, 2022, 6:00pm

It sometimes is, see e.g. the example at the bottom of the BenchmarkTools docs here.

For the integer literal the benchmark itself is compiled away (not just the function f). As you noticed, the llvm / native code produced for f(x) is identical to f(123456789). What you’re seeing is just an artefact of the benchmarking.

Oscar_Smith · June 13, 2022, 6:01pm

If you want to make sure it’s compiled away, the best way is to write something like
g() = f(123456789) and look at the @code_llvm g() to see if it’s doing anything other than returning a constant.

carstenbauer · June 13, 2022, 6:03pm

Demonstrating Oscar’s point:

julia> const x = 123456789;

julia> f(x) = x^2
f (generic function with 1 method)

julia> g() = f(123456789);

julia> h() = f(x);

julia> @code_llvm g()
;  @ REPL[3]:1 within `g`
define i64 @julia_g_201() #0 {
top:
  ret i64 15241578750190521
}

julia> @code_llvm h()
;  @ REPL[4]:1 within `h`
define i64 @julia_h_213() #0 {
top:
  ret i64 15241578750190521
}

Note that the const x is important here as otherwise the compiler can’t assume that x will not change in the future.

benninkrs · June 13, 2022, 6:26pm

It sometimes is

Are there any principles I can use to determine when “sometimes” is? What is interpolation if not replacing the variable by its value?

For the integer literal the benchmark itself is compiled away (not just the function f ).

I’m not sure I understand the difference.

What you’re seeing is just an artefact of the benchmarking.

That’s what bothers me. I am left with the feeling that I cannot trust @btime because I will never know what it is actually benchmarking.

benninkrs · June 13, 2022, 6:42pm

The actual context is something like this:

function g(..., ::Val{x}) where {x}
  y = f(x)
  # compute a result that depends on y
end

What I want to happen in practice is that when x is inferrable the run-time cost of f within g is as small as possible. My strategy has been to try design f so that in such cases as much of it as possible is compiled away. Given the all above, I don’t know how to reliably benchmark either f or g in this situation. (I get the same sort of discrepancy benchmarking g as I do benchmarking f.)

Benny · June 13, 2022, 7:04pm

If you just want to check whether a method call will be “compiled away”, i.e. evaluated at compile-time if its arguments’ values are known at compile-time, looking at compiled code e.g. @code_llvm is way more reliable than timing.

I think you were confused by exactly what “compiled away” means in this context. When you do @btime f(123456789), you’re not benchmarking f(123456789), you’re actually benchmarking temp() = f(123456789). f(123456789) is called in the local scope of a temporary method, so when the method is compiled, the compiler decided to “compile away” the f(123456789) call to a constant value. This decision also depends on what f does, like f(x) = rand(x) could never be compiled away.

When you interpolate @btime f($123456789), you benchmark something like temp(x) = f(x). The top method call temp(123456789) is not compiled away, it uses the compiled code temp(::Int) to take in a runtime integer and calculate stuff at runtime. So it actually represents what happens when f(123456789) is evaluated at runtime.

Honestly I have no idea why @btime is designed that way, it’s a source of a lot of confusion especially because the unorthodox usage of $ in a macro call isn’t even a guarantee of preventing the benchmark from compiling the method call away. Just seems better if none of the arguments are ever assumed constant and we can elect to write our own temporary methods with constant arguments. I’m guessing it’s actually some difficult design problem involving the benchmark loop being run at global scope rather than the macro call scope.

Benny · June 13, 2022, 7:20pm

On Julia v1.7.3 + BenchmarkTools v1.3.1, the @btime $a + $b example is no longer sub-nanosecond:

julia> a = 1; b = 2
2
julia> @btime 1 + 2 # constant folded call
  0.035 ns (0 allocations: 0 bytes)
3
julia> @btime $1 + $2 # interpolate values
  1.509 ns (0 allocations: 0 bytes)
3
julia> @btime $a + $b # interpolate values via global variables
  1.509 ns (0 allocations: 0 bytes)
3
julia> @btime $(Ref(a))[] + $(Ref(b))[] # documented Ref trick
  1.508 ns (0 allocations: 0 bytes)
3
julia> @btime a + b # use global variables
  18.460 ns (0 allocations: 0 bytes)
3

Oscar_Smith · June 13, 2022, 7:24pm

One important thing to realize is that in every language, you should be highly suspicious of any benchmark that reports less than a microsecond or so. Once you get below that level of timing, things like calling convention and inlining start to matter a lot which can vary depending on how the function is called.

benninkrs · June 13, 2022, 8:37pm

If I understand what you are saying, then the situation is something like this?

Case 1: No Interpolation

x = 1234
@btime f(x)

translates roughly to

x = 1234
@time call(f, x)   # f(::Int) called with 1234

Case 2: Interpolation

x = 1234
@btime f($x)

translates roughly to

@time call(f, 1234)    # f(::Int) called with 1234

Case 3: Literal

x =1234
@btime f(1234)

translates roughly to

temp() = f(1234)      # f(1234) compiled to a constant
@time call(temp)     # the constant is returned

(end cases)

It is not obvious to me why the 3rd case would be handled differently than the first two, but that would explain what I am seeing. Evidently this is much more complicated than I had supposed – I had supposed that everything was more like case 3, with interpolation causing variables appearing in temp() to be replaced by their values.

Benny · June 13, 2022, 9:34pm

I explained case 3 earlier, and I can also explain case 1 is benchmarking something like temp() = f(x). Because x is a non-const global variable that can’t be assumed constant, the method f is evaluated in the runtime loop. However, x’s type is uninferrable too, so you’re also benchmarking the runtime dispatch of f. I amended the a+b example in my earlier comment to show how much time that adds. I suppose it is a fair benchmark if your function call must be runtime dispatched in practice, but it’s more often the case that people wrote their function call to be dispatched at compile-time.

I can’t exactly explain how case 2 works, @btime does some macro trickery to make $ in source code benchmark a function call without runtime dispatch. It’s called “interpolation” but $-interpolation typically does not work in source code, only String literals (" ", """ """) or quoted Expr (:( ), quote end). @eval enables $ in source code with a simpler trick: wrapping the source code in a quote block before it’s put in a Core.eval call expression.

Topic		Replies	Views
How to speed up array comparisons? Performance question	12	1166	May 20, 2018
Benchmarking with interpolated variable faster than hardcoded literal value Performance	1	567	September 18, 2019
$a as argument to function General Usage	3	620	July 15, 2018
Meaning of 2 minimum times of `@btime` General Usage benchmarktools	13	560	June 20, 2022
Function calls in global scope, benchmarking, etc General Usage question	6	2591	August 14, 2019

Confusing behavior of btime

Related topics