As far as I can tell, the index sorting is activated for the CHOLMOD multiplication, at least for the case A*B’ (where A != B):
https://github.com/JuliaLang/julia/blob/master/base/sparse/cholmod.jl#L1274
https://github.com/JuliaLang/julia/blob/master/base/sparse/cholmod.jl#L646
When I make a matrix multiplication for A != B, I get the following times
@benchmark A2 = Base.SparseArrays.CHOLMOD.Sparse(S)*Base.SparseArrays.CHOLMOD.Sparse(2*S)'
BenchmarkTools.Trial:
memory estimate: 259.10 mb
allocs estimate: 39
--------------
minimum time: 196.704 ms (1.74% GC)
median time: 238.645 ms (16.94% GC)
mean time: 237.304 ms (17.05% GC)
maximum time: 275.710 ms (28.01% GC)
--------------
samples: 22
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
If A = B, the a different cholmod routines (aat) is called, which is only slightly faster:
@benchmark A2 = Base.SparseArrays.CHOLMOD.Sparse(S)*Base.SparseArrays.CHOLMOD.Sparse(S)'
BenchmarkTools.Trial:
memory estimate: 220.99 mb
allocs estimate: 32
--------------
minimum time: 177.867 ms (1.62% GC)
median time: 194.884 ms (1.47% GC)
mean time: 195.765 ms (1.51% GC)
maximum time: 235.854 ms (1.35% GC)
--------------
samples: 26
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
Also note that the full round trip (SparseMatricCSC → CHOLMOD.Sparse → SparseMatricCSC) is still 30x faster than doing the matrix multiplication in SparseMatricCSC directly:
@time A3 = sparse(Base.SparseArrays.CHOLMOD.Sparse(S)*Base.SparseArrays.CHOLMOD.Sparse(S)');
0.275268 seconds (49 allocations: 274.310 MB, 27.35% gc time)
Scratching a little bit deeper, CHOLMOD sorts the matrix product using the double transpose:
https://github.com/PetterS/SuiteSparse/blob/master/CHOLMOD/Core/cholmod_transpose.c#L1113
And in fact, this option (double transpose) is also implemented inside julia:
https://github.com/JuliaLang/julia/blob/7fb758a275a0b4cf0e3f4cbf482c065cb32f0011/base/sparse/sparsematrix.jl#L3260
If I adapt Julia’s spmatmul function (which I called myspmatmul in the following) to force the double transpose, then I get a speed-up of ~25x:
julia> @time A = S*S'; @time A2 = myspmatmul(S,S');
8.137340 seconds (2.99 M allocations: 312.158 MB, 2.98% gc time)
"now doing double transpose" = "now doing double transpose"
0.308712 seconds (44 allocations: 175.272 MB, 35.40% gc time)
In essence, I just replaced the line (in spmatmul):
C = SparseArrays.sortSparseMatrixCSC!(Cunsorted, sortindices=sortindices) # sortindices defaults to `:sortcols`
by this
C = SparseArrays.sortSparseMatrixCSC!(Cunsorted, sortindices=:doubletranspose)
The speed-up depends actually on the size of the matrices. For small matrices (smaller than 170x170 in my case), the current default method (:sortcols
) is faster. But beyond 170x170, the double transpose method is more efficient. As far as I can tell, CHODMOD uses only the double transpose which is a quite reasonable approach imho, since sparse matrices tends to be quite large for several applications.