I am attempting to compute, in parallel, the eigenvalues for a list of symmetric matrices. In the code below this is achieved with pmap
(using 10 workers) and sequentially (for comparison). I’ve set the number of threads used by BLAS for eigvals
to 2 to match the threads used by a single core. It appears that for smaller matrix sizes, the multi-core speed-up from pmap
is ~5-8x, though for the larger sizes tested here, the speed-up from pmap
versus the sequential calculation basically vanishes.
What is the bottleneck for this (admittedly naïve) approach, and is there a more appropriate way to proceed with such a parallelization?
Note: This is run with OpenBlas on Julia 1.7.0-beta3.
using Distributed, Distributions
@everywhere using LinearAlgebra
BLAS.set_num_threads(2)
addprocs(10)
for m in [10, 50, 100, 200, 500, 1000, 1500, 2000]
A = rand(Uniform(0., 1.), m, m)
symA = Symmetric(A)
mats=repeat([symA], length(workers()))
print(m)
@time pmap(eigvals, mats)
@time begin
for m in mats
eigvals(m)
end
end
end
50 0.938094 seconds (1.18 k allocations: 452.578 KiB)
0.011508 seconds (220 allocations: 774.062 KiB)
100 0.002881 seconds (1.25 k allocations: 76.531 KiB)
0.033060 seconds (3.46 k allocations: 2.457 MiB, 22.69% gc time, 9.06% compilation time)
200 0.021528 seconds (1.35 k allocations: 92.062 KiB)
0.045115 seconds (220 allocations: 7.556 MiB)
500 0.032562 seconds (1.25 k allocations: 137.609 KiB)
0.249784 seconds (220 allocations: 41.750 MiB, 2.44% gc time)
1000 0.209816 seconds (1.27 k allocations: 214.516 KiB)
1.146097 seconds (220 allocations: 159.775 MiB, 1.18% gc time)
1500 1.519743 seconds (1.27 k allocations: 294.281 KiB)
3.110555 seconds (240 allocations: 354.096 MiB, 2.67% gc time)
2000 5.861284 seconds (1.27 k allocations: 368.609 KiB)
6.673908 seconds (240 allocations: 624.709 MiB, 0.65% gc time)