[ANN] ThreadedDenseSparseMul.jl

RomeoV · November 26, 2023, 9:38am

ThreadedDenseSparseMul.jl

Threaded implementation of dense-sparse matrix multiplication, built on top of Polyester.jl.

Hello all, I’ve recently had a need to do C \leftarrow C - D * S very fast, where D and S are dense and sparse matrices, respectively. However:

The SparseArrays.jl package doesn’t support threaded multiplication.
The IntelMKL.jl package doesn’t seem to support dense sparsecsc multiplication, although one can get similar performance using that package and transposing appropriately. It also comes with possible licensing issues and is vendor-specific.
The ThreadedSparseCSR.jl package also just supports sparsecsr dense.
The ThreadedSparseArrays.jl package also just supports ThreadedSparseMatrixCSC dense, and also doesn’t install for me currently.

I haven’t found an implementation for that, so made one myself. In fact, the package Polyester.jl makes this super easy, the entire code is basically

import SparseArrays: SparseMatrixCSC, mul!; import SparseArrays
import Polyester: @batch

function SparseArrays.mul!(C::AbstractMatrix, A::AbstractMatrix, B::SparseMatrixCSC, α::Number, β::Number)
    @batch for j in axes(B, 2)
        C[:, j] .*= β
        C[:, j] .+= A * (α.*B[:, j])
    end
    return C
end

and therefore 95% of the credit should go to @Elrod .

However, this simple implementation beats SparseArrays.jl by about 2x (or even 3-4x on a good day) on my problem sizes, and significantly outperforms MKLSparse.jl, which only supports sparse x dense and therefore has to use transposes. See some more benchmark results in the Readme.

This package is currently performing type piracy only for the dispatch above. I’ve considered wrapping the computation into another type, but actually I believe it’s fine to load this package just like one would load MKLSparse.jl etc, see some discussion and here and here. However, I’m happy to hear other thoughts.

I hope this package will be useful to others!

Topic		Replies	Views
Double checking there is no multi-threaded dense x sparse matmul implementation Performance sparse	7	601	February 1, 2024
[ANN] Announcing ThreadedSparseCSR.jl Package Announcements	16	1284	November 15, 2021
How to speed up dense-sparse matrix multiplication where the sparse matrix is in CSC? Performance	0	168	March 28, 2024
Multithreading with sparse arrays Numerics	3	1754	May 6, 2021
Is there an easy way to parallelise matrix multiplication? Performance	8	4707	April 19, 2019

[ANN] ThreadedDenseSparseMul.jl

ThreadedDenseSparseMul.jl

Related topics