ThreadedDenseSparseMul.jl
Threaded implementation of dense-sparse matrix multiplication, built on top of
Polyester.jl
.
Hello all, I’ve recently had a need to do C \leftarrow C - D * S very fast, where D and S are dense and sparse matrices, respectively. However:
- The SparseArrays.jl package doesn’t support threaded multiplication.
- The IntelMKL.jl package doesn’t seem to support dense sparsecsc multiplication, although one can get similar performance using that package and transposing appropriately. It also comes with possible licensing issues and is vendor-specific.
- The ThreadedSparseCSR.jl package also just supports sparsecsr dense.
- The ThreadedSparseArrays.jl package also just supports ThreadedSparseMatrixCSC dense, and also doesn’t install for me currently.
I haven’t found an implementation for that, so made one myself. In fact, the package Polyester.jl makes this super easy, the entire code is basically
import SparseArrays: SparseMatrixCSC, mul!; import SparseArrays
import Polyester: @batch
function SparseArrays.mul!(C::AbstractMatrix, A::AbstractMatrix, B::SparseMatrixCSC, α::Number, β::Number)
@batch for j in axes(B, 2)
C[:, j] .*= β
C[:, j] .+= A * (α.*B[:, j])
end
return C
end
and therefore 95% of the credit should go to @Elrod .
However, this simple implementation beats SparseArrays.jl
by about 2x (or even 3-4x on a good day) on my problem sizes, and significantly outperforms MKLSparse.jl
, which only supports sparse x dense
and therefore has to use transposes. See some more benchmark results in the Readme.
This package is currently performing type piracy
only for the dispatch above. I’ve considered wrapping the computation into another type, but actually I believe it’s fine to load this package just like one would load MKLSparse.jl
etc, see some discussion and here and here. However, I’m happy to hear other thoughts.
I hope this package will be useful to others!