As I was teaching sparse matrices in class, I was humbled by a theory-practice gap.
The idea in theory: don’t use sparse matrices when you really have dense matrix.
The idea in practice: here, let me show you in Julia!
n = 10
using BenchmarkTools, SparseArrays
@benchmark A*v setup=begin
A = randn(n,n)
v = randn(n)
end
using BenchmarkTools
@benchmark A*v setup=begin
A = sparse(randn(n,n))
v = randn(n)
end
And then this shows:
Dense case
BenchmarkTools.Trial: 10000 samples with 961 evaluations.
Range (min … max): 86.672 ns … 798.907 ns ┊ GC (min … max): 0.00% … 85.88%
Time (median): 89.663 ns ┊ GC (median): 0.00%
Time (mean ± σ): 92.006 ns ± 23.905 ns ┊ GC (mean ± σ): 1.33% ± 4.33%
Sparse case
BenchmarkTools.Trial: 10000 samples with 965 evaluations.
Range (min … max): 81.649 ns … 804.188 ns ┊ GC (min … max): 0.00% … 86.96%
Time (median): 86.010 ns ┊ GC (median): 0.00%
Time (mean ± σ): 89.411 ns ± 31.736 ns ┊ GC (mean ± σ): 1.85% ± 4.61%
So the sparse case is 3 ns faster than the dense case ?!?
As I said, I was humbled in lecture
Notes
- Yes, the difference goes away for larger n and sparse is about 3x slower than dense.
- I haven’t seen this on any Intel/AMD x86-64 architectures.
- This does not happen on Julia 1.8, so something changed in 1.9 that made the sparse case faster.
- Using the following code shows that the dense case really should be faster…
## Copy-pasted sparse matrix vector product without allocations
using BenchmarkTools, SparseArrays, LinearAlgebra
function _spmatmul!(C, A, B, α, β)
size(A, 2) == size(B, 1) || throw(DimensionMismatch())
size(A, 1) == size(C, 1) || throw(DimensionMismatch())
size(B, 2) == size(C, 2) || throw(DimensionMismatch())
nzv = nonzeros(A)
rv = rowvals(A)
β != one(β) && LinearAlgebra._rmul_or_fill!(C, β)
for k in 1:size(C, 2)
@inbounds for col in 1:size(A, 2)
αxj = B[col,k] * α
for j in nzrange(A, col)
C[rv[j], k] += nzv[j]*αxj
end
end
end
C
end
@benchmark _spmatmul!(x,A,v,1.0,0.0) setup=begin
A = sparse(randn(n,n))
v = randn(n)
x = zeros(n)
end
## Equivalent dense-matrix vector product
using BenchmarkTools, SparseArrays, LinearAlgebra
function _matmul!(C, A, B, α, β)
size(A, 2) == size(B, 1) || throw(DimensionMismatch())
size(A, 1) == size(C, 1) || throw(DimensionMismatch())
size(B, 2) == size(C, 2) || throw(DimensionMismatch())
β != one(β) && LinearAlgebra._rmul_or_fill!(C, β)
for k in 1:size(C, 2)
@inbounds for col in 1:size(A, 2)
αxj = B[col,k] * α
for i in 1:size(A,1)
C[i, k] += A[i,col]*αxj
end
end
end
C
end
@benchmark _matmul!(x,A,v,1.0,0.0) setup=begin
A = randn(n,n)
v = randn(n)
x = zeros(n)
end
@benchmark mul!(x,A,v,1.0,0.0) setup=begin
A = randn(n,n)
v = randn(n)
x = zeros(n)
end
Let’s do our bakeoff…
Julia 1.8.4 (MacBook Air M2)
dense _matmul! - 44.7 ns
dense LinearAlgebra.mul! - 73.8 ns
sparse _spmatmul! - 74.3 ns
Julia 1.9.2 (MacBook Air M2)
dense _matmul! - 43.8 ns
dense LinearAlgebra.mul! - 73.0 ns
sparse _spmatmul! - 68.2 ns
The question: does anyone know what might have changed in 1.9 that would have made the sparse method faster?
Also, anyone know why the dense code for small n is slower than expected?