I want to test the speed of dot product dot()(speeded up by Julia) V.S. dot product without any parallel. I want to see how fast does Julia speed up the dot product.
For non-parallel dot product, I write a for-loop:
function sdot(x,y)
n = length(x)
res = 0.0
for i in 1:n
res += x[i] * y[i]
end
return res
end
And I also checked that ‘@show Threads.nthreads()’ returned 1 to make sure Julia use only 1 thread.
But I’m not sure whether Julia will still use any other parallel method by default to speed up the for loop in my sdot(), because I know Julia is very smart.
My final question is: how I can avoid any kind of speeding up in Julia? Since I want to make the comparison fair.
Maybe it’s useful to mention here that for arrays of floating-point numbers, LinearAlgebra.dot actually calls the underlying BLAS implementation (OpenBLAS by default, MKL if you recompiled Julia specifically).
In this specific instance, it would perhaps be more accurate to say that dot is sped up by C (and not Julia). Consequently, perhaps dot is not the best choice for your benchmark. Could you please tell us more about the kind of tests you’re performing, and the kind of conclusions you’d like to draw from them?
I already tested the speed under different optimization levels, and it’s consistent with your explanation. But I still want to know more details, because I have one question from my testing result:
Why different level will not affect BLAS(which is used in dot() ), but will affect for loop (which is used in sdot )?