Why Does @btime Show Near-Zero Execution Time Despite Non-Trivial @code_native and @code_llvm Output in Julia?

alex11 · August 31, 2024, 11:51am

I’m starting Chris Rackauckas’ course on parallel computation in Julia. In the first section in order to show the relationship between types and speed he defines two classes MyComplex (a struct of two Float64s) and MySlowComplex (where the two fields have type Any). We expect computation with MyComplex to be very fast but computation with MySlowComplex to be much slower. He defines a simple function g(x, y) to illustrate this. Now indeed

a = MyComplex(1.0, 1.0)
b = MyComplex(1.0, 1.0)
@btime g(a, b)

outputs 12.303 ns (1 allocation: 32 bytes) and

a = MySlowComplex(1.0, 1.0)
b = MySlowComplex(1.0, 1.0)
@btime g(a, b)

outputs 61.725 ns (5 allocations: 96 bytes) as expected. However, both @btime g(MyComplex(1.0, 1.0), MyComplex(1.0, 1.0)) and @btime g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)) output more or less the same 0.854 ns (0 allocations: 0 bytes). Now I could understand this if the compiler is doing the computation itself and inlining it. But the problem is that @code_llvm applied to these two expressions are radically different. So I suppose I’m confused as to what is going on here. I’m very new to Julia and perhaps don’t understand how @code_llvm works in the REPL and how the Julia compiler work.

screw_dog · August 31, 2024, 12:29pm

I’m no expert on @code_llvm but getting ns timings without allocations generally means that you are getting constant propagation. That is, the compiler can tell that you are calling a method with constant inputs and just precalculates the result.

The usual advice is to interpolate your function arguments like

a, b = MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)
@btime g($a, $b)

alex11 · August 31, 2024, 12:41pm

Thanks. I suppose the constant propagation is therefore happening as a result of some kind of runtime optimization because there’s no sign of it either in @code_native g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0))?

abraemer · August 31, 2024, 8:48pm

I think @code_native really gives you the code of the call to g without constant propagation because that wouldn’t be useful usually.
This is also not what BenchmarkTools.jl runs. Effectively, BT wraps your code into a closure and then runs that repeatedly. You can emulate this for @code_native by wrapping the call into an anonymous function and calling it immediately:
@code_native (()->g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)))()

alfaromartino · August 31, 2024, 11:36pm

I don’t know exactly what the function g does, but notice that sometimes the computers (not Julia) are smart enough to realize that you’re repeatedly calculating the same operation. Then, it stops calculating it and just return the same result. A sign of this is when you have times lower than 1 ns.

Usually, the way to trick the computer and force to perform the calculation is using Ref and interpolating the variable.

Compare the following

using BenchmarkTools

function foo(b)
    a = 0 
    for i in 1:1_000_000
        a =  1 + 2 * b # same calculation in each iteration
    end

    return a
end

@btime foo(2) # 0.778 ns

while

ref(x) = (Ref(x))[] 

function foo(b)
    a = 0 
    for i in 1:1_000_000
        a =  1 + 2 * b
    end

    return a
end

b = 2 
@btime foo(ref($b)) # 1.740 ns

stevengj · September 1, 2024, 12:33pm

(Even in the second case, it looks like LLVM is summing the series analytically for you rather than executing the loop.)

Topic		Replies	Views
Simple code run @time 1.7, 1.8, 1.9 difference New to Julia	7	445	April 28, 2023
Significant difference in execution speed between similar functions with nested loops Performance question	4	247	March 8, 2023
`@code_native` shows an algorithm that appears to be O(N), but benchmarking suggests that it's actually O(1) Performance	12	618	July 5, 2021
Understanding performance using `@btime` and `@code_warntype`, `@code_llvm`, etc Performance	13	2273	October 14, 2018
Why does exporting a single Float64 from a function take as long as 2,000,000 addition steps Performance	21	1165	May 9, 2019

Why Does @btime Show Near-Zero Execution Time Despite Non-Trivial @code_native and @code_llvm Output in Julia?

Related topics