for the first time I am trying to exploit multi-threading in Julia. However, it seems I am doing some rookie mistake, since it does not work. I am running this very simple code (not a MWE)
t = 0.0
ℓ = 1000
ψ = randn(6, ℓ)
f = zeros(6, ℓ)
Threads.@threads for j in 1:ℓ
f[:, j] = computeFreeDynamics(ψ[:, j], p, t)
end
where p is a structure of parameters and computeFreeDynamics is a “slow” function computing the free dynamics of a system.
I benchmarked the code with and without Threads.@threads and there is basically no change. However, if I try to use the same computeFreeDynamics function as RHS of an ODEProblem to run a MonteCarlo, Threads.@threads works as expected.
Do you have any idea what could be the problem?
Is computeFreeDynamics doing lots of matrix multiplications? if so, you will likely want to BLAS.set_num_threads(1) since otherwise all your cores will be used for the matrix multiplication, removing the ability to scale.
In general, there is no guarantee that multithreading will speed up your code by the number of threads, or even speed it up at all. The best way to improve performance is by focusing on the sequential implementation, ensuring type-stability and in particular reducing allocations. Can you share more details about computeFreeDynamics?
There’s little to go on in your example. As noted, if you’re doing some heavy lifting with linear algebra, it’s probably already parallelized.
There’s nothing wrong with the loop as such, except the allocations noted above. I.e. ψ[:, j] allocates a vector, and computeFreeDynamics also allocates a vector for return. Allocations in parallel programs can be a performance problem. But it’s possible to write fortran in any language, and make a pre allocated version computeFreeDynamics! so that the loop becomes
@threads for j in 1:ℓ
computeFreeDynamics!(@view(f[:, j]), @view(ψ[:, j]), p, t)
end
You can of course keep an allocating version around as something like