Speed up sparse matrix multiplication

Nova · August 31, 2020, 8:20am

I have a sparse matrix multiplication in my code but it’s very slow, is there anyway to make it faster?

@btime A2 = D * A * D

 50.816 s (18 allocations: 5.12 GiB)

A: adjacency matrix, 6554063×6554063 SparseMatrixCSC{Float64,Int64} with 152837785 stored entries

D: diagonal matrix, 6554063×6554063 SparseMatrixCSC{Float64,Int64} with 6554063 stored entries

lucas711642 · August 31, 2020, 2:36pm

If you know D is a diagonal matrix, make use of this information:

using BenchmarkTools, LinearAlgebra, SparseArrays

function diag_sparse_diag(D1::Diagonal, A::AbstractSparseMatrix, D2::Diagonal)
	diag1 = D1.diag
	diag2 = D2.diag
	@assert (length(diag1), length(diag2)) == size(A)
	res = similar(A)
	rows = rowvals(res)
	vals_A = nonzeros(A)
	vals_res = nonzeros(res)
	@inbounds for col in axes(res, 2)
		@simd for k in nzrange(res, col)
			row = rows[k]
			vals_res[k] = diag1[row] * vals_A[k] * diag2[col]
		end
	end
	res
end

n = 6554063
nvals = 152837785
A = sprandn(n, n, nvals / n ^ 2)
D = Diagonal(randn(n))
D_sparse = sparse(D)

println(D_sparse * A * D_sparse == D * A * D == diag_sparse_diag(D, A, D))

@btime $D_sparse * $A * $D_sparse
@btime $D * $A * $D
@btime diag_sparse_diag($D, $A, $D)

Output:

true
  45.017 s (16 allocations: 5.12 GiB)
  4.050 s (12 allocations: 4.65 GiB)
  2.415 s (6 allocations: 2.33 GiB)

If you are sure that D_sparse is a diagonal matrix, there is no cost on creating a Diagonal matrix using D = Diagonal(nonzeros(D_sparse)). In this case, the sparsity pattern of D_sparse must be exactly diagonal.

Nova · August 31, 2020, 7:07pm

So I see that sparse matrix multiplication is the slowest among all those. Can you explain why? I was expecting exactly the opposite.

Oscar_Smith · August 31, 2020, 7:14pm

Diagonal is a much more specific structure than SparseMatrix. Specifically, CSC matrices have relatively slow access times. More specifically structured matrix types like Diagonal or Blocked or Banded, will always be faster where applicable (often 10x faster). CSC is what you use when you aren’t lucky enough to have the additional structure.

Nova · August 31, 2020, 7:30pm

The output is a sparse matrix with dimension of (6554063, 6554063). I want to sum the output with its transpose but it is very slow. Is there any way to improve this?

@btime out2 = out + sparse(out');

29.954 s (16 allocations: 6.93 GiB)

Oscar_Smith · August 31, 2020, 7:41pm

Don’t use sparse(out') just out'. I benchmark it to be noticeably (although not a ton) faster. The main reason this is slow is that it’s allocating a ton of memory to store the result since out+out' often has 2x more nonzeros than out.

stevengj · August 31, 2020, 7:52pm

Digging through the source code, it looks like A + A' internally makes a SparseMatrixCSC copy of A'.

It would be nice to have an optimized method for this, since computing the Hermitian part of a matrix is a pretty common operation.

Nova · August 31, 2020, 7:54pm

It’s 3x faster. Can you explain why it’s like that? Because "out’ " is not sparse anymore and it is adjoint (which I am not exactly sure if it’s still sparse or full). I am used to MATLAB and in MATLAB sparse is always the fastest.

Oscar_Smith · August 31, 2020, 7:55pm

Julia has lazy adjoints basically ' doesn’t copy data, it just stores the fact that you took the adjoint as part of the type info. As such for sparse matrices, the adjoint preserves sparsity.

Topic		Replies	Views
Efficient sparse-dense matrix multiplication for diagonal-like sparse matrices Performance question	8	800	January 22, 2023
Potential performance regression wrt 0.6: Products involving diagonal matrices MUCH slower than with general sparse matrices Performance	7	826	September 21, 2019
Mul! vs SparseMatrix Performance	2	1084	December 12, 2020
Right multiplication of Diagonal matrix Numerics	2	1817	August 29, 2018
Scaling a sparse matrix row-wise and column-wise too slow Performance broadcast , sparse	20	451	June 23, 2024

Speed up sparse matrix multiplication

Related topics