Why Does @btime Show Near-Zero Execution Time Despite Non-Trivial @code_native and @code_llvm Output in Julia?

I’m starting Chris Rackauckas’ course on parallel computation in Julia. In the first section in order to show the relationship between types and speed he defines two classes MyComplex (a struct of two Float64s) and MySlowComplex (where the two fields have type Any). We expect computation with MyComplex to be very fast but computation with MySlowComplex to be much slower. He defines a simple function g(x, y) to illustrate this. Now indeed

a = MyComplex(1.0, 1.0)
b = MyComplex(1.0, 1.0)
@btime g(a, b)

outputs 12.303 ns (1 allocation: 32 bytes) and

a = MySlowComplex(1.0, 1.0)
b = MySlowComplex(1.0, 1.0)
@btime g(a, b)

outputs 61.725 ns (5 allocations: 96 bytes) as expected. However, both @btime g(MyComplex(1.0, 1.0), MyComplex(1.0, 1.0)) and @btime g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)) output more or less the same 0.854 ns (0 allocations: 0 bytes). Now I could understand this if the compiler is doing the computation itself and inlining it. But the problem is that @code_llvm applied to these two expressions are radically different. So I suppose I’m confused as to what is going on here. I’m very new to Julia and perhaps don’t understand how @code_llvm works in the REPL and how the Julia compiler work.

1 Like

I’m no expert on @code_llvm but getting ns timings without allocations generally means that you are getting constant propagation. That is, the compiler can tell that you are calling a method with constant inputs and just precalculates the result.

The usual advice is to interpolate your function arguments like

a, b = MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)
@btime g($a, $b)
1 Like

Thanks. I suppose the constant propagation is therefore happening as a result of some kind of runtime optimization because there’s no sign of it either in @code_native g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0))?

I think @code_native really gives you the code of the call to g without constant propagation because that wouldn’t be useful usually.
This is also not what BenchmarkTools.jl runs. Effectively, BT wraps your code into a closure and then runs that repeatedly. You can emulate this for @code_native by wrapping the call into an anonymous function and calling it immediately:
@code_native (()->g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)))()

2 Likes

I don’t know exactly what the function g does, but notice that sometimes the computers (not Julia) are smart enough to realize that you’re repeatedly calculating the same operation. Then, it stops calculating it and just return the same result. A sign of this is when you have times lower than 1 ns.

Usually, the way to trick the computer and force to perform the calculation is using Ref and interpolating the variable.

Compare the following

using BenchmarkTools

function foo(b)
    a = 0 
    for i in 1:1_000_000
        a =  1 + 2 * b # same calculation in each iteration
    end

    return a
end

@btime foo(2) # 0.778 ns

while

ref(x) = (Ref(x))[] 

function foo(b)
    a = 0 
    for i in 1:1_000_000
        a =  1 + 2 * b
    end

    return a
end

b = 2 
@btime foo(ref($b)) # 1.740 ns

2 Likes

(Even in the second case, it looks like LLVM is summing the series analytically for you rather than executing the loop.)

2 Likes