Slowdown when computing eigenvalues for list of matrices with pmap

christian-hh · September 3, 2021, 5:32am

I am attempting to compute, in parallel, the eigenvalues for a list of symmetric matrices. In the code below this is achieved with pmap (using 10 workers) and sequentially (for comparison). I’ve set the number of threads used by BLAS for eigvals to 2 to match the threads used by a single core. It appears that for smaller matrix sizes, the multi-core speed-up from pmap is ~5-8x, though for the larger sizes tested here, the speed-up from pmap versus the sequential calculation basically vanishes.

What is the bottleneck for this (admittedly naïve) approach, and is there a more appropriate way to proceed with such a parallelization?

Note: This is run with OpenBlas on Julia 1.7.0-beta3.

using Distributed, Distributions
@everywhere using LinearAlgebra

BLAS.set_num_threads(2)

addprocs(10)

for m in [10, 50, 100, 200, 500, 1000, 1500, 2000]

    A = rand(Uniform(0., 1.), m, m)
    symA = Symmetric(A)
    mats=repeat([symA], length(workers()))
    
    print(m)
    
    @time pmap(eigvals, mats)
    
    @time begin
        for m in mats
            eigvals(m)
        end
    end
end

50  0.938094 seconds (1.18 k allocations: 452.578 KiB)
  0.011508 seconds (220 allocations: 774.062 KiB)
100  0.002881 seconds (1.25 k allocations: 76.531 KiB)
  0.033060 seconds (3.46 k allocations: 2.457 MiB, 22.69% gc time, 9.06% compilation time)
200  0.021528 seconds (1.35 k allocations: 92.062 KiB)
  0.045115 seconds (220 allocations: 7.556 MiB)
500  0.032562 seconds (1.25 k allocations: 137.609 KiB)
  0.249784 seconds (220 allocations: 41.750 MiB, 2.44% gc time)
1000  0.209816 seconds (1.27 k allocations: 214.516 KiB)
  1.146097 seconds (220 allocations: 159.775 MiB, 1.18% gc time)
1500  1.519743 seconds (1.27 k allocations: 294.281 KiB)
  3.110555 seconds (240 allocations: 354.096 MiB, 2.67% gc time)
2000  5.861284 seconds (1.27 k allocations: 368.609 KiB)
  6.673908 seconds (240 allocations: 624.709 MiB, 0.65% gc time)

Sukera · September 3, 2021, 6:51am

Intuitively, I’d say it’s because the matrices have to be transferred to the worker processes, which takes time.

baggepinnen · September 3, 2021, 7:09am

Does this mean that each core will use hyper threads? If so, your inlikely to see any benefit from using two threads per core, and might actually see slowdowns. One thread per physical core is often better for performance than using hyperthreads.

Topic		Replies	Views
Pmap and svdvals from documentation speed Performance question , distributed	6	568	September 8, 2021
Pmap and multi-threaded BLAS Performance blas , parallel	2	949	November 29, 2019
Using Julia with @parallel pmap or blank makes no difference in speed. Julia at Scale	3	851	March 22, 2018
Parallel computing using Optim General Usage optim , pmap	5	1622	June 25, 2020
Performance issue with multithreaded computation with matrix operations at its heart (Threads.@threads vs. BLAS threads) Performance blas , parallel , multithreading , linearalgebra , threads	7	391	November 13, 2023

Slowdown when computing eigenvalues for list of matrices with pmap

Related topics