# Speed up sparse matrix multiplication

I have a sparse matrix multiplication in my code but it’s very slow, is there anyway to make it faster?

``````@btime A2 = D * A * D

50.816 s (18 allocations: 5.12 GiB)
``````

A: adjacency matrix, 6554063×6554063 SparseMatrixCSC{Float64,Int64} with 152837785 stored entries

D: diagonal matrix, 6554063×6554063 SparseMatrixCSC{Float64,Int64} with 6554063 stored entries

If you know `D` is a diagonal matrix, make use of this information:

``````using BenchmarkTools, LinearAlgebra, SparseArrays

function diag_sparse_diag(D1::Diagonal, A::AbstractSparseMatrix, D2::Diagonal)
diag1 = D1.diag
diag2 = D2.diag
@assert (length(diag1), length(diag2)) == size(A)
res = similar(A)
rows = rowvals(res)
vals_A = nonzeros(A)
vals_res = nonzeros(res)
@inbounds for col in axes(res, 2)
@simd for k in nzrange(res, col)
row = rows[k]
vals_res[k] = diag1[row] * vals_A[k] * diag2[col]
end
end
res
end

n = 6554063
nvals = 152837785
A = sprandn(n, n, nvals / n ^ 2)
D = Diagonal(randn(n))
D_sparse = sparse(D)

println(D_sparse * A * D_sparse == D * A * D == diag_sparse_diag(D, A, D))

@btime \$D_sparse * \$A * \$D_sparse
@btime \$D * \$A * \$D
@btime diag_sparse_diag(\$D, \$A, \$D)
``````

Output:

``````true
45.017 s (16 allocations: 5.12 GiB)
4.050 s (12 allocations: 4.65 GiB)
2.415 s (6 allocations: 2.33 GiB)
``````

If you are sure that `D_sparse` is a diagonal matrix, there is no cost on creating a `Diagonal` matrix using `D = Diagonal(nonzeros(D_sparse))`. In this case, the sparsity pattern of `D_sparse` must be exactly diagonal.

5 Likes

So I see that sparse matrix multiplication is the slowest among all those. Can you explain why? I was expecting exactly the opposite.

`Diagonal` is a much more specific structure than `SparseMatrix`. Specifically, CSC matrices have relatively slow access times. More specifically structured matrix types like `Diagonal` or Blocked or Banded, will always be faster where applicable (often 10x faster). CSC is what you use when you aren’t lucky enough to have the additional structure.

4 Likes

The output is a sparse matrix with dimension of (6554063, 6554063). I want to sum the output with its transpose but it is very slow. Is there any way to improve this?

``````@btime out2 = out + sparse(out');

29.954 s (16 allocations: 6.93 GiB)
``````

Don’t use `sparse(out')` just `out'`. I benchmark it to be noticeably (although not a ton) faster. The main reason this is slow is that it’s allocating a ton of memory to store the result since `out+out'` often has 2x more nonzeros than `out`.

1 Like

Digging through the source code, it looks like `A + A'` internally makes a `SparseMatrixCSC` copy of `A'`.

It would be nice to have an optimized method for this, since computing the Hermitian part of a matrix is a pretty common operation.

5 Likes

It’s 3x faster. Can you explain why it’s like that? Because "out’ " is not sparse anymore and it is adjoint (which I am not exactly sure if it’s still sparse or full). I am used to MATLAB and in MATLAB sparse is always the fastest.

Julia has lazy `adjoint`s basically `'` doesn’t copy data, it just stores the fact that you took the adjoint as part of the type info. As such for sparse matrices, the adjoint preserves sparsity.

2 Likes