Eigs on huge SparseArrays doesn't use all BLAS threads

Ahmed_Abouelkomsan · November 23, 2020, 1:25pm

Hi all,

I am dealing with huge sparse arrays of dimensions greater than 10^6. I am interested in finding only few extremal eigenvalues. To this end, I tried to use the Arpack package but I noticed that if the size of the sparse array get too big, the number of used CPUs in use decreases. For example

using LinearAlgebra
using Arpack
using SparseArrays
BLAS.set_num_threads(16)

A = sprand(100000,100000,0.001)

eigs(A,nev = 20,which=:SR)

Then I get this from the top command

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
38936 ahmed     20   0 19.901g 0.016t 133852 R  1597  2.2  21:09.33 julia

Now if I increase the dimension of A

using LinearAlgebra
using Arpack
using SparseArrays
BLAS.set_num_threads(16)

A = sprand(1000000,1000000,0.001)

eigs(A,nev = 20,which=:SR)

Then the top command reads

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
38936 ahmed     20   0 34.919g 0.030t 133852 R 155.7  4.1  44:02.02 julia

One notices that not all CPUs are being used. The decrease is in fact drastic! Any explanation for this weird behavior or what might be wrong? This has been tested on two machines, one with 12 physical cores and one with 24 physical cores.

I also tested this on similar packages like KrylovKit and it also shows the same behavior which might be related to how Julia itself deals with huge operations on huge sparse arrays. Any help is appreciated

stevengj · November 23, 2020, 1:36pm

Sparse-matrix operations don’t (can’t) use BLAS. BLAS is only for dense-matrix operations.

jagot · November 23, 2020, 3:03pm

If you’re using a Krylov method (or any other method that relies on matrix–vector products), then perhaps this package: https://github.com/jagot/ThreadedSparseArrays.jl could be useful? It tries to speed-up the multiplication by threading over the output columns, so for matrix–vector products, you have to use mul!(y, transpose(At), x) for At = ThreadedSparseMatrixCSC(A), where A is your sparse matrix.

The package is mostly a proof-of-concept right now, but seems to work.

ctkelley · November 23, 2020, 5:44pm

BLAS is used for things like orthogonalization and scalar products in Krylov methods, but direct sparse solvers can’t use it in a significant way ( I think ).

stevengj · November 23, 2020, 5:46pm

That’s true, but my impression is that usually the matrix-vector products account for the bulk of the time (unless you are building up a huge subspace without restarting).

ctkelley · November 23, 2020, 6:01pm

@stevengj

 That’s true, but my impression is that usually the matrix-vector products account for the bulk of the time (unless you are building up a huge subspace without restarting).

I’ve had cases where we had to do a few hundred Krylovs/Newton because our preconditioner, while scalable, was still pretty bad. The Jacobian-vector product was expensive, but the orthogonalization was expensive enough to be painful. Someone from the Trilinos project pointed me at classical Gram-Schmidt (twice!) for the orthogonalization, and that helped a lot.

antoine-levitt · November 23, 2020, 8:22pm

I don’t have much experience with sparse matrices, can you expand on why sparse algorithms (eg spmv) can’t use threading? Because they’re bandwidth-bound?

jagot · November 23, 2020, 8:27pm

I don’t think it’s so much that they can’t, but rather that Julia’s built-in sparse library has not been threaded yet. See e.g. this open PR that the package I mentioned above was based on/inspired by.

Then of course, depending on which storage you use (column- or row-major), some operations are easier to parallelize than others.

Topic		Replies	Views
BLAS vs Threads on a cluster Performance	6	538	April 23, 2024
Operations on small matrices and BLAS in 1.6.0 RC1 New to Julia	2	417	March 2, 2021
Parallel computing for eigs() function Julia at Scale	10	2738	April 14, 2021
`eigs` for sparse tridiagonal matrix: Arpack.jl vs scipy Performance linearalgebra , python	3	1099	September 27, 2018
Is there an easy way to parallelise matrix multiplication? Performance	8	4703	April 19, 2019

Eigs on huge SparseArrays doesn't use all BLAS threads

Related topics