How to speed up dense-sparse matrix multiplication where the sparse matrix is in CSC?

yhuang · March 28, 2024, 11:32pm

For a sparse matrix stored in CSC format, without using MKLSparse, left multiplying it by a dense matrix is faster than right multiplying it by a dense matrix. This is quite intuitive because the sparse matrix is in CSC, and Julia stores dense matrices in the column-major order. Here is one benchmark example.

using LinearAlgebra, SparseArrays
using BenchmarkTools

BLAS.set_num_threads(1)

n = 5000 
m = 30000 
r = 20 

ratio = m / n^2
B = sprand(n, n, ratio)
SymB = max.(B, B')

R = randn(n, r)
Rt = Matrix(R')

C = zeros(n, r)
Ct = zeros(r, n)

The result is

julia> @btime mul!($C, $SymB, $R)
       @btime mul!($Ct, $Rt, $SymB)
  2.481 ms (0 allocations: 0 bytes)
  1.132 ms (0 allocations: 0 bytes)

julia> @which mul!(C, SymB, R)
mul!(C, A, B)
     @ LinearAlgebra /u/subspace_s4/software/julia-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:275

julia> @which mul!(C, Rt, SymB)
mul!(C, A, B)
     @ LinearAlgebra /u/subspace_s4/software/julia-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:275

There is some significant speed difference. MKLSparse greatly speeds up the sparse-dense matrix multiplications, but dense-sparse’s performance stays the same as it’s not implemented. Here are some simple results.

julia> using MKLSparse
       @btime mul!($C, $SymB, $R)
       @btime mul!($Ct, $Rt, $SymB)
  440.872 μs (0 allocations: 0 bytes)
  1.118 ms (0 allocations: 0 bytes)

julia> @which mul!(C, SymB, R)
mul!(C::StridedMatrix{Float64}, adjA::SparseMatrixCSC{Float64, Int64}, B::StridedMatrix{Float64})
     @ MKLSparse.BLAS ~/.julia/packages/MKLSparse/4LlPk/src/BLAS/level_2_3/matmul.jl:36

julia> @which mul!(Ct, Rt, SymB)
mul!(C, A, B)
     @ LinearAlgebra /u/subspace_s4/software/julia-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:275

I understand it’s simple to speed up the dense-sparse multiplications by performing transposes and then sparse-dense multiplications. I wonder whether there is still any possibility to let dense-sparse faster than sparse-dense as we can see previously dense-sparse is about 2x faster.

Topic		Replies	Views
Sparse matrix multiplication: SparseMatrixCSC can be ~100x slower than Base.SparseArrays.CHOLMOD.Sparse General Usage performance	4	5009	February 6, 2017
Asymmetric speed of in-place `sparse*dense` matrix product General Usage	7	1525	November 8, 2018
Speed comparison: Sparse matrix multiplication vs usual matrix multiplication General Usage question	7	3885	February 13, 2017
[ANN] Fast SpMv with CompressedSparseBlocks.jl Package Announcements performance , linearalgebra , sparse	9	709	July 26, 2022
Speeding up elementwise Vector-SparseMatrixCSC multiplication broadcasting Performance sparse	11	533	December 12, 2024

How to speed up dense-sparse matrix multiplication where the sparse matrix is in CSC?

Related topics