Slower execution with multi-threading using @threads macro

I am running into an issue where allowing multi-threading leads to slower execution time. Following is the simple code where I am testing it:

function sqr(x)
    return x*x
end

y = zeros(10)
for i=1:8
    y[i] = sqr(i)
end

Output:
0.008442 seconds (18.07 k allocations: 1.085 MiB)

Now, allowing multi-threading (8 threads) using @threads macro:

function sqr(x)
    return x*x
end

y = zeros(10)
Threads.@threads for i=1:8
    y[i] = sqr(i)
end

Output:
0.038529 seconds (20.16 k allocations: 1.068 MiB)

The code utilizing 8 threads is considerably slower than the one using just 1 thread. This seems to be counter intuitive.

Any suggestions/reasoning on why this could be happening and how to correct it? I am using Julia 1.5.0.

  • Read Performance Tips · The Julia Language
  • Specifically, don’t use global scope.
  • Use a proper way of benchmarking (using e.g. BenchmarkTools)
  • This code is way to small for threading to be useful
  • This might not benefit from threading since the operation is so simple that it’s possible that memory speed will bottleneck and not executing instructions.
2 Likes

Only if you assume that using threads has zero costs. You may be also surprised by this:

 julia> function test2()
        y = zeros(10)
        Threads.@threads for i=1:8
            nothing
        end
        end

julia> @btime test2()
  1.820 μs (7 allocations: 1.08 KiB)

8 Threads which do nothing is slower than your unthreaded code?

Indeed, in this case too the threaded code is slower than the unthreaded code. I am only using this simple case to observe the performance. I have larger code where I observe the same behaviour. Do you have a suggestion on a better way to parallelize independent for loops?

Threads are fine. You should benchmark your real code to see if you get better performance. If not (as you said) there may be something wrong with your code. You can show us your code and perhaps we can try for a sound suggestion.

2 Likes

Thank you @oheil. I will benchmark and see if I could find something that could be causing this issue before posting my large code here. Also, just for reference, I found a discussion on similar issue here.