By adding some work in the if statement (test2 function), even though the evaluation always false and so nothing has been done in the if block, I got 40X slow down compare to real nothing (test) in the if blocks.
function test(m::Array{Int8,2})
n = size(m)[1]
progress = 0
@inbounds for i in 1:n, j in i:n
nonoverlap = true
progress+=1
if progress%100000000==0
#do nothing
end
end
end
function test2(m::Array{Int8,2})
n = size(m)[1]
progress = 0
@inbounds for i in 1:n, j in i:n
nonoverlap = true
progress+=1
if progress%100000000==0
@printf("%d\n",progress)
end
end
end
m=convert(Array{Int8,2}, rand(0:1, 22, 1000))
benchmark test(m) and test(m2) results are attached. I am confused what are the extra running time comes from. Thanks.
I see that you have tried to quote your code, but it’s not quite right. Please take a look at this post to see how to format code for discourse: PSA: how to quote code with backticks
You should also remember to interpolate variables when using BenchmarkTools: @benchmark test($m), instead of @benchmark test(m). It may not make a difference here, but in general, it does, and it removes the need to double-check if performance issues are due to the lack of interpolation.
When benchmarking it is important to be able to do some napkin math to figure out if the results one get are at all reasonable.
For example, here you have a double loop going doing 22 * 22 ~ 500 iterations. As a very rough estimate, we can say that a CPU can do 1 integer addition per cycle. So a 3 GHz CPU should be able to do 500 iterations in
500 / (3 * 10^9) * 10^9 ns which is approximately 170 ns. So clearly, getting a result of 4 ns is just not reasonable.
function test(m::Array{Int8,2})
n = size(m)[1]
progress = 0
@inbounds for i in 1:n, j in i:n
nonoverlap = true
progress+=1
if progress%100000000==0
#do nothing
end
end
end
function test2(m::Array{Int8,2})
n = size(m)[1]
progress = 0
@inbounds for i in 1:n, j in i:n
nonoverlap = true
progress+=1
if progress%100000000==0
@printf("%d\n",progress)
end
end
end
There is nothing going on inside this block, therefore it is not necessary to evaluate the condition. And therefore it is not necessary to evaluate the loop either. The whole function is basically optimized away.
Even though it’s alway false, you don’t know that a priori, so you still have to evaluate the condition at every step of the loop.
1+1 also doesn’t do anything visible from the caller. A compiler is free to optimize as much as it wants as long as it doesn’t change any observable result. This is known as the “as if” rule.
The @printf macro generates a ton of inline code (which is not great and should be fixed), you could try putting it in its own function and see if that helps at all.