As I was teaching sparse matrices in class, I was humbled by a theory-practice gap.

The idea in theory: don’t use sparse matrices when you really have dense matrix.

The idea in practice: here, let me show you in Julia!

```
n = 10
using BenchmarkTools, SparseArrays
@benchmark A*v setup=begin
A = randn(n,n)
v = randn(n)
end
using BenchmarkTools
@benchmark A*v setup=begin
A = sparse(randn(n,n))
v = randn(n)
end
```

And then this shows:

**Dense case**

```
BenchmarkTools.Trial: 10000 samples with 961 evaluations.
Range (min … max): 86.672 ns … 798.907 ns ┊ GC (min … max): 0.00% … 85.88%
Time (median): 89.663 ns ┊ GC (median): 0.00%
Time (mean ± σ): 92.006 ns ± 23.905 ns ┊ GC (mean ± σ): 1.33% ± 4.33%
```

**Sparse case**

```
BenchmarkTools.Trial: 10000 samples with 965 evaluations.
Range (min … max): 81.649 ns … 804.188 ns ┊ GC (min … max): 0.00% … 86.96%
Time (median): 86.010 ns ┊ GC (median): 0.00%
Time (mean ± σ): 89.411 ns ± 31.736 ns ┊ GC (mean ± σ): 1.85% ± 4.61%
```

So the sparse case is 3 ns faster than the dense case ?!?

As I said, I was humbled in lecture

Notes

- Yes, the difference goes away for larger n and sparse is about 3x slower than dense.
- I haven’t seen this on any Intel/AMD x86-64 architectures.
- This does
*not*happen on Julia 1.8, so something changed in 1.9 that made the sparse case faster. - Using the following code shows that the dense case really should be faster…

```
## Copy-pasted sparse matrix vector product without allocations
using BenchmarkTools, SparseArrays, LinearAlgebra
function _spmatmul!(C, A, B, α, β)
size(A, 2) == size(B, 1) || throw(DimensionMismatch())
size(A, 1) == size(C, 1) || throw(DimensionMismatch())
size(B, 2) == size(C, 2) || throw(DimensionMismatch())
nzv = nonzeros(A)
rv = rowvals(A)
β != one(β) && LinearAlgebra._rmul_or_fill!(C, β)
for k in 1:size(C, 2)
@inbounds for col in 1:size(A, 2)
αxj = B[col,k] * α
for j in nzrange(A, col)
C[rv[j], k] += nzv[j]*αxj
end
end
end
C
end
@benchmark _spmatmul!(x,A,v,1.0,0.0) setup=begin
A = sparse(randn(n,n))
v = randn(n)
x = zeros(n)
end
## Equivalent dense-matrix vector product
using BenchmarkTools, SparseArrays, LinearAlgebra
function _matmul!(C, A, B, α, β)
size(A, 2) == size(B, 1) || throw(DimensionMismatch())
size(A, 1) == size(C, 1) || throw(DimensionMismatch())
size(B, 2) == size(C, 2) || throw(DimensionMismatch())
β != one(β) && LinearAlgebra._rmul_or_fill!(C, β)
for k in 1:size(C, 2)
@inbounds for col in 1:size(A, 2)
αxj = B[col,k] * α
for i in 1:size(A,1)
C[i, k] += A[i,col]*αxj
end
end
end
C
end
@benchmark _matmul!(x,A,v,1.0,0.0) setup=begin
A = randn(n,n)
v = randn(n)
x = zeros(n)
end
@benchmark mul!(x,A,v,1.0,0.0) setup=begin
A = randn(n,n)
v = randn(n)
x = zeros(n)
end
```

Let’s do our bakeoff…

Julia 1.8.4 (MacBook Air M2)

dense _matmul! - 44.7 ns

dense LinearAlgebra.mul! - 73.8 ns

sparse _spmatmul! - 74.3 ns

Julia 1.9.2 (MacBook Air M2)

dense _matmul! - 43.8 ns

dense LinearAlgebra.mul! - 73.0 ns

sparse _spmatmul! - 68.2 ns

The question: does anyone know what might have changed in 1.9 that would have made the sparse method faster?

Also, anyone know why the dense code for small n is slower than expected?