Is there an easy way to parallelise matrix multiplication?

boutor2 · April 18, 2019, 9:03am

Hi,
I am working with large sparse matrices (10^5 x 10^5). I would like to make full use of the cores / threads of my CPU to accelerate the multiplication process. As of now, I am using mul!:

using SparseArrays,LinearAlgebra
n = 10000
L = sprand(Float64,n,n,0.3)
u0 = sprand(Float64,n,0.3)
u1 = similar(u0)
mul!(u1,L,u0);

However, this only uses one thread.
Is there a package or a trick to parallelise the multiplication? I know I could always do something like:

function parallel_mul!(u1,L,u0)
 Threads.@threads for i in 1:size(u1)[1]
  for j in 1:size(u1)[2]
   mul!(u1[i,j],L[i,:],u0[:,j])
  end
 end
end

But maybe there is something more straightforward? Else, may be it is worth making parallel_mul! more robust and implement it in some package?

Per · April 18, 2019, 9:46am

I think BLAS will run multi-threaded by default in Julia, but otherwise you can control it via BLAS.set_num_threads.

However, a matrix-vector multiplication, as in your example, is limited by memory bandwidth, not CPU, so there’s no point running it multi-threaded.

kristoffer.carlsson · April 18, 2019, 9:53am

The matrix is sparse in the example

This is not true (dense matrix, n = 50000)

julia> BLAS.set_num_threads(1)

julia> @time mul!(u1,L,u0);
  1.429343 seconds (4 allocations: 160 bytes)

julia> BLAS.set_num_threads(8)

julia> @time mul!(u1,L,u0);
  0.577491 seconds (4 allocations: 160 bytes)

kristoffer.carlsson · April 18, 2019, 9:57am

Using a dense RHS will improve performance by a lot:

julia> @time mul!(u1,L,u0);
  0.584000 seconds (4 allocations: 160 bytes)

julia> u0_d = rand(Float64,n)

julia> u1_d = similar(u0_d)

julia> @time mul!(u1_d,L,u0_d);
  0.022939 seconds (4 allocations: 160 bytes)

There is often no advantage of having the vector RHS be sparse unless you have hyper sparse matrices (but then you shouldn’t use CSC sparse format anyway)

For threaded sparse matvec (at least in the transpose case), see https://github.com/JuliaLang/julia/pull/29525. It cannot really be merged yet because the threading infrastructure in Base is not there yet.

Per · April 18, 2019, 6:20pm

That’s interesting. On my system, there’s no difference in timing when changing the number of threads.

stevengj · April 18, 2019, 7:48pm

Note that this matrix is neither large nor sparse by the standards of realistic applications, so benchmarks of this case may be misleading.

30% nonzero is not usually sparse enough to be worthwhile using sparse algorithms for (unless the sparsity has a very special pattern). Real sparse matrices typically have only small number of nonzero elements per row, e.g. ≲ 10 is typical in FEM-like situations. A moderate sized problem of this sort (I wouldn’t call it “large”) would be 10⁵×10⁵ with 10⋅10⁵ nonzeros, or 0.01% (1e-4) sparsity. (For example sprand(10^5, 10^5, 1e-4), though sparsity patterns are rarely random in practice.)

boutor2 · April 19, 2019, 8:36am

Thanks all, now it makes sense. I knew about BLAS.set_num_threads() but it did not change anything in my case as computation time was limited - as said by @Per - by memory bandwidth.

Right - the actual matrices I am playing with have 1e-3 sparsity, with a block pattern, where in this case it really makes sense to use sparse algorithms.

Wow, yes it changes a lot! I don’t really understand why but, it is for sure the way to go.

Now using dense matrices, BLAS runs indeed multi-threaded and setting multiple threads improve a lot the operation.

kristoffer.carlsson · April 19, 2019, 8:44am

You where using a sparse matrix which is not using BLAS but julias native sparse matrix multiplier, so how many threads BLAS uses has no effect.

boutor2 · April 19, 2019, 8:53am

Oh, right… Got it !

Topic		Replies	Views
Matrix vector multiplication Performance question	4	909	September 27, 2020
Multithreading with sparse arrays Numerics	3	1756	May 6, 2021
Best way to parallelize Julia at Scale parallel , linearalgebra , sparse	12	1341	August 25, 2022
Multithreaded MatVec Numerics multithreading , matrices	10	1964	February 4, 2022
Solving A\b in parallel Julia at Scale parallel , linearalgebra	13	3882	August 21, 2021

Is there an easy way to parallelise matrix multiplication?

Related topics