# How to speed up dense-sparse matrix multiplication where the sparse matrix is in CSC?

For a sparse matrix stored in CSC format, without using MKLSparse, left multiplying it by a dense matrix is faster than right multiplying it by a dense matrix. This is quite intuitive because the sparse matrix is in CSC, and Julia stores dense matrices in the column-major order. Here is one benchmark example.

``````using LinearAlgebra, SparseArrays
using BenchmarkTools

n = 5000
m = 30000
r = 20

ratio = m / n^2
B = sprand(n, n, ratio)
SymB = max.(B, B')

R = randn(n, r)
Rt = Matrix(R')

C = zeros(n, r)
Ct = zeros(r, n)
``````

The result is

``````julia> @btime mul!(\$C, \$SymB, \$R)
@btime mul!(\$Ct, \$Rt, \$SymB)
2.481 ms (0 allocations: 0 bytes)
1.132 ms (0 allocations: 0 bytes)

julia> @which mul!(C, SymB, R)
mul!(C, A, B)
@ LinearAlgebra /u/subspace_s4/software/julia-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:275

julia> @which mul!(C, Rt, SymB)
mul!(C, A, B)
@ LinearAlgebra /u/subspace_s4/software/julia-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:275
``````

There is some significant speed difference. MKLSparse greatly speeds up the sparse-dense matrix multiplications, but dense-sparse’s performance stays the same as it’s not implemented. Here are some simple results.

``````julia> using MKLSparse
@btime mul!(\$C, \$SymB, \$R)
@btime mul!(\$Ct, \$Rt, \$SymB)
440.872 μs (0 allocations: 0 bytes)
1.118 ms (0 allocations: 0 bytes)

julia> @which mul!(C, SymB, R)