I have strange performance issues when I use
Threads.@threads to parallelize a for-loop. My machine runs on 40 cores:
julia> Threads.nthreads() 40
Now, I lose performance even with a quite simple example:
N = 500 m = 100 A = randn(m, N, N) |> SharedArray @time Threads.@threads for i ∈ 1:m det(A[i, :, :]) end
takes about 12-13 seconds and allocates 382.813 MiB.
This should be compared to
@time for i ∈ 1:m det(A[i, :, :]) end
which runs in a little under 0.5 seconds and allocates 381.693 MiB.
The difference in the allocated memory is primarily due to spawning the threads as
@time Threads.@threads for i ∈ 1:m 1+1 end
allocates 1010.651 KiB, i.e., approximatively the difference.
Am I doing something very obvious very wrong or should I throw my server out of the window ?
Julia version is 1.5.3