When trying to generate a “super adjacency matrix”, that is an adjacency matrix where each of its entries is the identity matrix, I noticed a bottleneck when generating the sparse block-matrix.
More specifically, I am trying to make a sparse block matrix from a graph’s adjacency matrix, but I have noticed that when the adjacency matrix is not very sparse allocation becomes a bottleneck. As my algorithm is essentially looping over non-zero indices of the adjacency matrix, it is expected it takes more time, but I noticed that when I omit the allocation it leads to computations several order of magnitude more.
The relevant code is the following
function generate_sbm(
n::Int, m::Int, g::SimpleGraph;
)
#/ Get adjacency matrix
A = adjacency_matrix(g)
#/ Allocate
D = spzeros(n*m, n*m)
#/ Fill off-diagonal blocks if adjacency matrix is non-zero
#~ Note: this loop is a bottleneck if the graph is (relatively) dense
for cidx in findall(!iszero, A)
ν, μ = Tuple(cidx)
rows = (ν - 1) * n + 1 : ν * n
cols = (μ - 1) * n + 1 : μ * n
@inbounds for (i,j) in zip(rows, cols)
D[i,j] = 1.
end
end
return D
end
I have tried several options (such as D[rows,cols] .= 1., or allocating it with a sparse matrix itself), but this one turns out to be the fastest of all options I tried.
I have a feeling that there must be a (slightly) better way of making these allocations when A is “dense” (that is, the graph is highly connected). Note that in this “dense” regime the super-adjacency matrix D is still (relatively) sparse when n is large, as all entries only contain n of the n*n non-zero elements. Any help, ideas, or tips are greatly appreciated.
Setting elements in sparse matrices requires some work on the data-structure, and multiplied by n*nnz(A) it adds up.
The way to go would be to generate the sparse matrix directly using:
sparse(I, J, V,[ m, n, combine])
constructor - the matrix is simple enough to manage this.
Thanks, using this I manage to shave of several orders of magnitude. It was tricky to get the no. of non-zero entries, but once I had that it was a breeze. The code now looks like this:
function generate_sbm(
n::Int, m::Int, g::SimpleGraph;
)
#/ Get adjacency matrix
A = adjacency_matrix(g)
dg = degree(g)
#/ Allocate right no. of non-zero entries
nodeswithedges = sum(dg[findall(>(0), dg)])
Is = zeros(Int, nodeswithedges*S)
Js = zeros(Int, nodeswithedges*S)
Vs = zeros(Float64, nodeswithedges*S)
count = 1
#~ Fill
for cidx in findall(!iszero, A)
μ, ν = Tuple(cidx)
d = γ ./ n_neighbors[μ]
rows = (μ - 1) * S + 1 : μ * S
cols = (ν - 1) * S + 1 : ν * S
@inbounds for (i,j) in zip(rows, cols)
Is[count] = i
Js[count] = j
Vs[count] = 1.
count += 1
end
end
#~ Create off-diagonal elements
D = sparse(Is, Js, Vs, n*m, n*m)
return D
end
Note that the Vs in this example are easy as well, but in my code I will fill it with values other than 1, hence it is in there.
Interesting, ExtendableSparse has a linked list structure as an intermediate which receives the new elements which are transferred to a CSC matrix upon a flush! call. This works well for typical FEM problems. I indeed had cases with long rows where this approach did fail performance-wise So your case might fall into that category.
I think the I,J,V constructor is still the fastest option here, but I’ll remark that inserting values into a SparseMatrixCSCin column-major order has very little overhead when the internal buffers are preallocated. Your original code may have been considerably accelerated by using D = sizehint!(spzeros(n*m, n*m), nodeswithedges*S) to do this preallocation (it appears you were already using column-major order properly).
By the way, it looks like S is a global variable you didn’t pass to the function in your most recent code. This may be slowing things down somewhat.
Another neat trick is to use a sparse array of SMatrix from StaticArrays.jl, if your blocks are small enough to be nicely represented as SMatrix. This block matrix format can sometimes be very convenient.
Thanks for the suggestion on using sizehint!(...), I wasn’t aware of this functionality, so I will keep it in mind for the future.
And about the global variable, this is not the case here. It was an artifact of me quickly making a MWE and omitting too many things for clarity.
Finally, my blocks are perhaps a bit too big (~100 \times 100), so I don’t think it’ll be worth it to convert the blocks to static arrays.