I am attempting to compute, in parallel, the eigenvalues for a list of symmetric matrices. In the code below this is achieved with `pmap`

(using 10 workers) and sequentially (for comparison). I’ve set the number of threads used by BLAS for `eigvals`

to 2 to match the threads used by a single core. It appears that for smaller matrix sizes, the multi-core speed-up from `pmap`

is ~5-8x, though for the larger sizes tested here, the speed-up from `pmap`

versus the sequential calculation basically vanishes.

What is the bottleneck for this (admittedly naïve) approach, and is there a more appropriate way to proceed with such a parallelization?

Note: This is run with OpenBlas on Julia 1.7.0-beta3.

```
using Distributed, Distributions
@everywhere using LinearAlgebra
BLAS.set_num_threads(2)
addprocs(10)
for m in [10, 50, 100, 200, 500, 1000, 1500, 2000]
A = rand(Uniform(0., 1.), m, m)
symA = Symmetric(A)
mats=repeat([symA], length(workers()))
print(m)
@time pmap(eigvals, mats)
@time begin
for m in mats
eigvals(m)
end
end
end
50 0.938094 seconds (1.18 k allocations: 452.578 KiB)
0.011508 seconds (220 allocations: 774.062 KiB)
100 0.002881 seconds (1.25 k allocations: 76.531 KiB)
0.033060 seconds (3.46 k allocations: 2.457 MiB, 22.69% gc time, 9.06% compilation time)
200 0.021528 seconds (1.35 k allocations: 92.062 KiB)
0.045115 seconds (220 allocations: 7.556 MiB)
500 0.032562 seconds (1.25 k allocations: 137.609 KiB)
0.249784 seconds (220 allocations: 41.750 MiB, 2.44% gc time)
1000 0.209816 seconds (1.27 k allocations: 214.516 KiB)
1.146097 seconds (220 allocations: 159.775 MiB, 1.18% gc time)
1500 1.519743 seconds (1.27 k allocations: 294.281 KiB)
3.110555 seconds (240 allocations: 354.096 MiB, 2.67% gc time)
2000 5.861284 seconds (1.27 k allocations: 368.609 KiB)
6.673908 seconds (240 allocations: 624.709 MiB, 0.65% gc time)
```