How to make Julia slow?


I want to test the speed of dot product dot()(speeded up by Julia) V.S. dot product without any parallel. I want to see how fast does Julia speed up the dot product.

For non-parallel dot product, I write a for-loop:

function sdot(x,y)
    n   = length(x)
    res = 0.0
    for i in 1:n
        res += x[i] * y[i]
    return res

And I also checked that ‘@show Threads.nthreads()’ returned 1 to make sure Julia use only 1 thread.

But I’m not sure whether Julia will still use any other parallel method by default to speed up the for loop in my sdot(), because I know Julia is very smart.

My final question is: how I can avoid any kind of speeding up in Julia? Since I want to make the comparison fair.


What are you comparing with?

sdot() and

You could try starting Julia with julia -O0 or julia -O1.

-O, --optimize={0,1,2,3}
Set the optimization level (default level is 2 if unspecified or 3 if used without a level)

1 Like

Another way to make Julia slow is to start Julia with julia --inline=no.

Control whether inlining is permitted, including overriding @inline declarations

Julia doesn’t do any automatic parallelization. You have to explicitly annotate loops that you want to parallelize.


Maybe it’s useful to mention here that for arrays of floating-point numbers, actually calls the underlying BLAS implementation (OpenBLAS by default, MKL if you recompiled Julia specifically).

In this specific instance, it would perhaps be more accurate to say that dot is sped up by C (and not Julia). Consequently, perhaps dot is not the best choice for your benchmark. Could you please tell us more about the kind of tests you’re performing, and the kind of conclusions you’d like to draw from them?


You can see what code is being generated by Julia using the @code_llvm and @code_native macros.


Hi, could you please tell me where I can know what’s the difference between those levels? I cannot find answer on google.

Julia -O0, -O1 are mostly too slow to be of use. -O2 is the default. -O3 does practically nothing over -O2.


I already tested the speed under different optimization levels, and it’s consistent with your explanation. But I still want to know more details, because I have one question from my testing result:

Why different level will not affect BLAS(which is used in dot() ), but will affect for loop (which is used in sdot )?

Thanks you very much for your time

BLAS is an already compiled library that Julia calls into. sdot is compiled by Julia itself so the optimization level will matter there.

1 Like