I’m starting Chris Rackauckas’ course on parallel computation in Julia. In the first section in order to show the relationship between types and speed he defines two classes MyComplex (a struct of two Float64s) and MySlowComplex (where the two fields have type Any). We expect computation with MyComplex to be very fast but computation with MySlowComplex to be much slower. He defines a simple function g(x, y) to illustrate this. Now indeed
a = MyComplex(1.0, 1.0)
b = MyComplex(1.0, 1.0)
@btime g(a, b)
outputs 12.303 ns (1 allocation: 32 bytes) and
a = MySlowComplex(1.0, 1.0)
b = MySlowComplex(1.0, 1.0)
@btime g(a, b)
outputs 61.725 ns (5 allocations: 96 bytes) as expected. However, both @btime g(MyComplex(1.0, 1.0), MyComplex(1.0, 1.0)) and @btime g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)) output more or less the same 0.854 ns (0 allocations: 0 bytes). Now I could understand this if the compiler is doing the computation itself and inlining it. But the problem is that @code_llvm applied to these two expressions are radically different. So I suppose I’m confused as to what is going on here. I’m very new to Julia and perhaps don’t understand how @code_llvm works in the REPL and how the Julia compiler work.
I’m no expert on @code_llvm but getting ns timings without allocations generally means that you are getting constant propagation. That is, the compiler can tell that you are calling a method with constant inputs and just precalculates the result.
The usual advice is to interpolate your function arguments like
a, b = MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)
@btime g($a, $b)
Thanks. I suppose the constant propagation is therefore happening as a result of some kind of runtime optimization because there’s no sign of it either in @code_native g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0))?
I think @code_native really gives you the code of the call to g without constant propagation because that wouldn’t be useful usually.
This is also not what BenchmarkTools.jl runs. Effectively, BT wraps your code into a closure and then runs that repeatedly. You can emulate this for @code_native by wrapping the call into an anonymous function and calling it immediately: @code_native (()->g(MySlowComplex(1.0, 1.0), MySlowComplex(1.0, 1.0)))()
I don’t know exactly what the function g does, but notice that sometimes the computers (not Julia) are smart enough to realize that you’re repeatedly calculating the same operation. Then, it stops calculating it and just return the same result. A sign of this is when you have times lower than 1 ns.
Usually, the way to trick the computer and force to perform the calculation is using Ref and interpolating the variable.
Compare the following
using BenchmarkTools
function foo(b)
a = 0
for i in 1:1_000_000
a = 1 + 2 * b # same calculation in each iteration
end
return a
end
@btime foo(2) # 0.778 ns
while
ref(x) = (Ref(x))[]
function foo(b)
a = 0
for i in 1:1_000_000
a = 1 + 2 * b
end
return a
end
b = 2
@btime foo(ref($b)) # 1.740 ns