Can someone enlighten me as to why a single Float64 export from inside a function takes as much computation time as 2,000,000 additions? See code below from Julia 1.1.
module test
struct teststruct
x::Array{Float64,1}
end
data=teststruct(rand(1000))
function sum_arg(data)
s = 0.0
for i in data.x
for i in data.x
s += i
end
end
end
using BenchmarkTools
timer() = @time sum_arg(data)
timer()
timer()
timer()
timer()
end
OUTPUTS:
0.008286 seconds (11.14 k allocations: 527.958 KiB)
0.000303 seconds
0.000302 seconds
0.000302 seconds
and compare that computation time with the computation time seen below:
module test
struct teststruct
x::Array{Float64,1}
end
data=teststruct(rand(1000))
function sum_arg(data)
s = 0.0
for i in data.x
for i in data.x
s += i
end
end
data.x[1]=s
end
using BenchmarkTools
timer() = @time sum_arg(data)
timer()
timer()
timer()
timer()
end
OUTPUTS:
0.012986 seconds (11.54 k allocations: 548.084 KiB)
0.000886 seconds (2 allocations: 32 bytes)
0.000884 seconds (2 allocations: 32 bytes)
0.000884 seconds (2 allocations: 32 bytes)
and compare the above computation time to the computation time below:
module test
struct teststruct
x::Array{Float64,1}
end
data=teststruct(rand(1000))
function sum_arg(data)
s = 0.0
for i in data.x
for i in data.x
s += i
end
end
return s
end
using BenchmarkTools
timer() = @time sum_arg(data)
timer()
timer()
timer()
timer()
end
OUTPUTS:
0.012986 seconds (11.54 k allocations: 548.084 KiB)
0.000886 seconds (2 allocations: 32 bytes)
0.000884 seconds (2 allocations: 32 bytes)
0.000884 seconds (2 allocations: 32 bytes)
I also tried another version wherein I swapped out “return s” for “a=1.0”, and the computation time was back down to 0.0003 seconds. Therefore the function sum_arg() cannot output it’s value without incurring a time penalty greater than 2,000,000 addition steps? Please enlighten me.
Your first function doesn’t do anything so the optimizer can freely replace the whole body with return nothing. I’m not sure this is exactly what is happening here but you need to be careful when you benchmark.
I think this simply isn’t a very useful benchmark. The first version of your function returns nothing and has no side effects. Thus, there is no reason that the compiler needs to actually execute it, so the amount of time it takes is not informative.
fyi with BenchmarkTools use @btime sum_arg($data) and, assigning macros to functions that way may not do what you expect. Just write the macro expression.
Why do you say “there is no reason that the compiler needs to actually execute it” when my (maybe naive) understanding is:
the complier compiles – it doesn’t “execute”.
I thought the @btime macro measures execution duration, not compiling duration. So why are you mentioning the compiler?
maybe you mean to say “there is no reason that the CPU needs to actually execute it” … but then how does the 1 million additions get executed?
Can you please explain clearly why you think it’s not a good benchmark time measurement? That method of “time() = @time sum_arg()” was recommended on this forum by another user. That’s why I used it. I’m happy to use a different benchmark method.
@tamasgal I don’t understand how your example codes are relevant. I think maybe you’re missing the point: I executed the exact same code in my two code examples, but with the only difference between them being a “return s” at the end of the function or not. The “return s” create a LONG increase in execution time. Why?
The example is relevant because it shows that when the result of a computation is unused and has no effect, the compiler can remove it completely. The same thing is happening in your benchmark.
Nope, I understand what you don’t understand; you have to wrap your head around.
The compiler is clever. If you write a function which does not return anything, the compiler will create machine code which does exactly nothing. Just like Kristoffer told you in the first reply.
In my examples you see the exact same behaviour. Julia is a high-level language and you usually don’t tell your computer what it has to do “step-by-step”. You write your algorithm and let the compiler do the rest.
Just to keep belaboring the point, the observable external effects of the following functions are the same:
function f1()
x = 0
for i in 1:1000
x = x + i
end
end
function f2()
end
f1 returns nothing and doesn’t modify any external state, and so does f2. The compiler is thus free to transform f1 into f2 if it sees fit to do so, since there is no difference in their effects.
Whether or not your CPU actually executed some particular number of addition instructions is considered to be an implementation detail, not a measurable effect. This is not just a cop-out; it’s actually crucial in Julia or any other programming language: Without that assumption, every single performance improvement in the language would have to be considered a breaking change. That’s obviously not something anyone wants.
Note also that this is a general challenge when benchmarking code in languages with optimizing compilers. You will see similar behavior in C++, for example (depending perhaps on your -O settings).
Aha… thanks to Jeff and Tamasgal for explaining it clearly. So you’re saying the compiler decides NOT to create code to execute the summation that clearly I told it to sum.
Ok, now I understand. I’m a newbie to high performance code, so thank you for enlightening me.
You probably don’t mean it this way, but in a written form this comes across as making a point of not thanking @tamasgal. He actually went through considerable effort to give examples illustrating what’s going on. Credit to both Jeff and Tamas (and others in the thread) seems in order
Ya, when I saw his suggested code of “i_crash_your_computer()”, I got a little bit worried that he was trying to get me to enter some dangerous code into my computer… so his response weirded me out a little bit. Then his quick jump to machine code was confusing because I haven’t worked with machine code since 1998, so it was all so confusing that I couldn’t gain any insight from it. Now that I understand better, I definitely appreciate his post, and will edit my responses accordingly.
BTW. In case you missed it, your code actually uses the @time macro, not the @btime macro. The latter is located in BenchmarkTools.jl, while the former is in Base. You should use @btime.