Fast allocation of blocks in sparse matrix

johannesnauta · February 1, 2024, 10:21am

When trying to generate a “super adjacency matrix”, that is an adjacency matrix where each of its entries is the identity matrix, I noticed a bottleneck when generating the sparse block-matrix.

More specifically, I am trying to make a sparse block matrix from a graph’s adjacency matrix, but I have noticed that when the adjacency matrix is not very sparse allocation becomes a bottleneck. As my algorithm is essentially looping over non-zero indices of the adjacency matrix, it is expected it takes more time, but I noticed that when I omit the allocation it leads to computations several order of magnitude more.

The relevant code is the following

function generate_sbm(
    n::Int, m::Int, g::SimpleGraph;
    )
    #/ Get adjacency matrix
    A = adjacency_matrix(g)
    #/ Allocate
    D = spzeros(n*m, n*m)

    #/ Fill off-diagonal blocks if adjacency matrix is non-zero
    #~ Note: this loop is a bottleneck if the graph is (relatively) dense
    for cidx in findall(!iszero, A)
        ν, μ = Tuple(cidx)        
        rows = (ν - 1) * n + 1 : ν * n
        cols = (μ - 1) * n + 1 : μ * n
        @inbounds for (i,j) in zip(rows, cols)
            D[i,j] = 1.
        end
    end
    return D
end

I have tried several options (such as D[rows,cols] .= 1., or allocating it with a sparse matrix itself), but this one turns out to be the fastest of all options I tried.

I have a feeling that there must be a (slightly) better way of making these allocations when A is “dense” (that is, the graph is highly connected). Note that in this “dense” regime the super-adjacency matrix D is still (relatively) sparse when n is large, as all entries only contain n of the n*n non-zero elements. Any help, ideas, or tips are greatly appreciated.

j-fu · February 1, 2024, 10:39am

Shameless plug: you may have a look at ExtendableSparse.jl

johannesnauta · February 1, 2024, 10:55am

I find, when changing

D = spzeros(n*m, n*m)

to

D = ExtendableSparseMatrix(n*m, n*m)

that performance is exactly the same. Anything more I need to know of ExtendibleSparse to gain performance in my case (if possible)?

Dan · February 1, 2024, 10:56am

Setting elements in sparse matrices requires some work on the data-structure, and multiplied by n*nnz(A) it adds up.
The way to go would be to generate the sparse matrix directly using:

sparse(I, J, V,[ m, n, combine])

constructor - the matrix is simple enough to manage this.

johannesnauta · February 1, 2024, 11:43am

Thanks, using this I manage to shave of several orders of magnitude. It was tricky to get the no. of non-zero entries, but once I had that it was a breeze. The code now looks like this:

function generate_sbm(
    n::Int, m::Int, g::SimpleGraph;
    )
    #/ Get adjacency matrix
    A = adjacency_matrix(g)
    dg = degree(g)
    #/ Allocate right no. of non-zero entries
    nodeswithedges = sum(dg[findall(>(0), dg)])
    Is = zeros(Int, nodeswithedges*S)
    Js = zeros(Int, nodeswithedges*S)
    Vs = zeros(Float64, nodeswithedges*S)
    count = 1
    #~ Fill
    for cidx in findall(!iszero, A)
        μ, ν = Tuple(cidx)
        d = γ ./ n_neighbors[μ]
        rows = (μ - 1) * S + 1 : μ * S
        cols = (ν - 1) * S + 1 : ν * S
        @inbounds for (i,j) in zip(rows, cols)
            Is[count] = i
            Js[count] = j
            Vs[count] = 1.
            count += 1
        end
    end
    #~ Create off-diagonal elements
    D = sparse(Is, Js, Vs, n*m, n*m)
    return D
end

Note that the Vs in this example are easy as well, but in my code I will fill it with values other than 1, hence it is in there.

j-fu · February 1, 2024, 1:28pm

Interesting, ExtendableSparse has a linked list structure as an intermediate which receives the new elements which are transferred to a CSC matrix upon a flush! call. This works well for typical FEM problems. I indeed had cases with long rows where this approach did fail performance-wise So your case might fall into that category.

mikmoore · February 1, 2024, 5:16pm

I think the I,J,V constructor is still the fastest option here, but I’ll remark that inserting values into a SparseMatrixCSC in column-major order has very little overhead when the internal buffers are preallocated. Your original code may have been considerably accelerated by using D = sizehint!(spzeros(n*m, n*m), nodeswithedges*S) to do this preallocation (it appears you were already using column-major order properly).

By the way, it looks like S is a global variable you didn’t pass to the function in your most recent code. This may be slowing things down somewhat.

Another neat trick is to use a sparse array of SMatrix from StaticArrays.jl, if your blocks are small enough to be nicely represented as SMatrix. This block matrix format can sometimes be very convenient.

johannesnauta · February 2, 2024, 6:34pm

Thanks for the suggestion on using sizehint!(...), I wasn’t aware of this functionality, so I will keep it in mind for the future.
And about the global variable, this is not the case here. It was an artifact of me quickly making a MWE and omitting too many things for clarity.
Finally, my blocks are perhaps a bit too big (~100 \times 100), so I don’t think it’ll be worth it to convert the blocks to static arrays.