Optimizing the blockdiag function for SparseArrays

irhum · May 24, 2020, 11:27am

I was looking into the blockdiag function in SparseArrays (which concatenates sparse matrices on the diagonal), only to see it is rather unusually slow when the number of matrices are large, for instance, using

M = [sprand(rand(15:50), rand(15:50), 0.02) for i in 1:2500]
@time output = blockdiag(M...)

using @time, it takes almost 1.5 seconds to complete
1.517669 seconds (37.96 k allocations: 192.393 MiB, 4.66% gc time)

most of the time inside the function call seems to be spent on computing the correct value to use for the values and indices of the SparseArray via type promotion, as when the values are passed into a custom method for the function, the speedups seem large for the same M

function blockdiag(::Type{Tv}, ::Type{Ti}, X::AbstractSparseMatrixCSC...) where {Tv, Ti}
    num = length(X)
    mX = Int[ size(x, 1) for x in X ]
    nX = Int[ size(x, 2) for x in X ]
    m = sum(mX)
    n = sum(nX)

    colptr = Vector{Ti}(undef, n+1)
    nnzX = Int[ nnz(x) for x in X ]
    nnz_res = sum(nnzX)
    rowval = Vector{Ti}(undef, nnz_res)
    nzval = Vector{Tv}(undef, nnz_res)

    nnz_sofar = 0
    nX_sofar = 0
    mX_sofar = 0
    for i = 1 : num
        colptr[(1 : nX[i] + 1) .+ nX_sofar] = getcolptr(X[i]) .+ nnz_sofar
        rowval[(1 : nnzX[i]) .+ nnz_sofar] = rowvals(X[i]) .+ mX_sofar
        nzval[(1 : nnzX[i]) .+ nnz_sofar] = nonzeros(X[i])
        nnz_sofar += nnzX[i]
        nX_sofar += nX[i]
        mX_sofar += mX[i]
    end
    colptr[n+1] = nnz_sofar + 1

    SparseMatrixCSC(m, n, colptr, rowval, nzval)
end

@time output = blockdiag(Float64, Int64, M...)

which completes in roughly 2 milliseconds (0.002272 seconds (5.02 k allocations: 2.984 MiB))
I’m not entirely sure why the implementation of the function spends so much time on identifying the correct type to use for the array

irhum · May 24, 2020, 1:16pm

For anyone interested, there’s now an open pull request suggesting one possible fix

github.com/JuliaLang/julia

Faster blockdiag for uniform input value-index types

JuliaLang:master ← irhum:master

opened 01:12PM - 24 May 20 UTC

irhum

+12 -3

The current blockdiag implementation in SparseArrays uses type promotion to iden…tify what types the values and index should be for the output SparseArray. This can be quite slow when the input to blockdiag is many small, sparse arrays (for the sake of example, over 2500+). If all the input sparse arrays have the same value and index types, then it's not necessary to compute them via type promotion and we can directly proceed to creating the new matrix. This can provide significant speedups; for M generated with ```julia M = [sprand(rand(15:50), rand(15:50), 0.02) for i in 1:2500] ``` and benchmarking the existing implementation with BenchmarkTools ```julia @benchmark output = blockdiag(M...) ``` we get ``` BenchmarkTools.Trial: memory estimate: 192.35 MiB allocs estimate: 37956 -------------- minimum time: 1.404 s (1.30% GC) median time: 1.434 s (1.31% GC) mean time: 1.438 s (1.24% GC) maximum time: 1.482 s (0.89% GC) -------------- samples: 4 evals/sample: 1 ``` running the same benchmark, with same M, with the new implementation, we get a speedup of nearly 1000x ```julia @benchmark output = blockdiag(M...) ``` ```BenchmarkTools.Trial: memory estimate: 3.00 MiB allocs estimate: 5019 -------------- minimum time: 1.072 ms (0.00% GC) median time: 1.162 ms (0.00% GC) mean time: 1.386 ms (11.86% GC) maximum time: 5.059 ms (60.41% GC) -------------- samples: 3591 evals/sample: 1 ``` the current blockdiag implementation is hence split into 3 possible methods ```julia blockdiag(X::AbstractSparseMatrixCSC...) # for the general case blockdiag(X::AbstractSparseMatrixCSC{Tv, Ti}...) where {Tv, Ti <: Integer} # for increased speed when all input X have the same index and value types blockdiag(::Type{Tv}, ::Type{Ti}, X::AbstractSparseMatrixCSC...) where {Tv, Ti <: Integer} # internally called by the two above methods, and available to the user when they're dealing with heterogenous inputs, and wish to specifiy the output index and value types manually for the increased speed benefits

Topic		Replies	Views
Return type of blockdiag? Performance type	1	413	November 5, 2018
Potential performance regression wrt 0.6: Products involving diagonal matrices MUCH slower than with general sparse matrices Performance	7	826	September 21, 2019
Apply blockdiag function over an array of matrices New to Julia question	1	269	April 12, 2022
Block sparse matrix multiplication with repeated blocks Numerics	4	836	April 26, 2020
Blockdiag for dense matrices Specific Domains linearalgebra	10	3033	February 3, 2020

Optimizing the blockdiag function for SparseArrays

Related topics