Optimizing the blockdiag function for SparseArrays

I was looking into the blockdiag function in SparseArrays (which concatenates sparse matrices on the diagonal), only to see it is rather unusually slow when the number of matrices are large, for instance, using

M = [sprand(rand(15:50), rand(15:50), 0.02) for i in 1:2500]
@time output = blockdiag(M...)

using @time, it takes almost 1.5 seconds to complete
1.517669 seconds (37.96 k allocations: 192.393 MiB, 4.66% gc time)

most of the time inside the function call seems to be spent on computing the correct value to use for the values and indices of the SparseArray via type promotion, as when the values are passed into a custom method for the function, the speedups seem large for the same M

function blockdiag(::Type{Tv}, ::Type{Ti}, X::AbstractSparseMatrixCSC...) where {Tv, Ti}
    num = length(X)
    mX = Int[ size(x, 1) for x in X ]
    nX = Int[ size(x, 2) for x in X ]
    m = sum(mX)
    n = sum(nX)

    colptr = Vector{Ti}(undef, n+1)
    nnzX = Int[ nnz(x) for x in X ]
    nnz_res = sum(nnzX)
    rowval = Vector{Ti}(undef, nnz_res)
    nzval = Vector{Tv}(undef, nnz_res)

    nnz_sofar = 0
    nX_sofar = 0
    mX_sofar = 0
    for i = 1 : num
        colptr[(1 : nX[i] + 1) .+ nX_sofar] = getcolptr(X[i]) .+ nnz_sofar
        rowval[(1 : nnzX[i]) .+ nnz_sofar] = rowvals(X[i]) .+ mX_sofar
        nzval[(1 : nnzX[i]) .+ nnz_sofar] = nonzeros(X[i])
        nnz_sofar += nnzX[i]
        nX_sofar += nX[i]
        mX_sofar += mX[i]
    colptr[n+1] = nnz_sofar + 1

    SparseMatrixCSC(m, n, colptr, rowval, nzval)

@time output = blockdiag(Float64, Int64, M...)

which completes in roughly 2 milliseconds (0.002272 seconds (5.02 k allocations: 2.984 MiB))
I’m not entirely sure why the implementation of the function spends so much time on identifying the correct type to use for the array

For anyone interested, there’s now an open pull request suggesting one possible fix

1 Like