Potential performance regression wrt 0.6: Products involving diagonal matrices MUCH slower than with general sparse matrices

Hello. It so happens that I need to normalize sparse (real) matrices from time to time. Moving to v0.7 (and later), I have come across some surprising behaviour involving Diagonal (sparse) matrices.

This example illustrates the point (of course, in real life, I do not multiply small, sparse identity matrices, but the performance drop is the same):

using SparseArrays
using LinearAlgebra
using Profile

function diag_times_sparse(nrows, pswitch)
    spmat = sparse(1.0*I,nrows, nrows)
    diagmat = Diagonal(diag(spmat))
    if pswitch > 1
       @profile resmat = diagmat*spmat*diagmat
       Profile.print(C=true, sortedby=:count)
    else
       @time resmat = diagmat*spmat*diagmat
    end
    return resmat
end

function sparse_times_sparse(nrows)
    spmat1 = sparse(1.0*I,nrows, nrows)
    spmat2 = sparse(1.0*I,nrows, nrows)
    @time resmat = spmat2*spmat1*spmat2
    return resmat
end

diag_times_sparse(5,1)
sparse_times_sparse(5)

diag_times_sparse(1000,1)
#diag_times_sparse(1000,2) if one wants profile data
sparse_times_sparse(1000)

On my machine (julia 0.6.4, Linux x86_64, binary distribution), I get as output
0.000016 seconds (8 allocations: 864 bytes)
0.187978 seconds (35.96 k allocations: 1.816 MiB)
0.000032 seconds (8 allocations: 47.844 KiB)
0.000103 seconds (20 allocations: 158.656 KiB)
the first two lines being the calls to get the functions compiled.

With v0.7 (again, plain vanilla binary download), I get:
0.000019 seconds (12 allocations: 1.063 KiB)
0.146651 seconds (175.73 k allocations: 8.672 MiB)
1.675994 seconds (14 allocations: 48.172 KiB)
0.000080 seconds (18 allocations: 158.469 KiB),

and finally, 1.3rc2 (plain vanilla binary distribution):
0.000022 seconds (18 allocations: 1.313 KiB)
0.000598 seconds (16 allocations: 1.438 KiB)
3.462305 seconds (22 allocations: 48.531 KiB)
0.001336 seconds (16 allocations: 122.250 KiB)

In v0.7, the profiler shows that one spends one’s time in the interpreter, even during the second call.

A hand-rolled normalization function works just fine.

Hope this helps to put someone on the right track.

Keep the impressive work going!

2 Likes

(Hello Frank and welcome back!)

Thanks for this report. After some investigation, it looks like a specialization of the Matrix*Matrix multiplication is defined here for the specific case of a SparseMatrix*Diagonal product. However, this specialization expects the Diagonal operand to be defined by a Vector subtype:

function mul!(C::AbstractSparseMatrixCSC, A::AbstractSparseMatrixCSC, D::Diagonal{T, <:Vector}) where T

whereas in your specific example the type of the Diagonal matrix is:

julia> typeof(diagmat)
Diagonal{Float64,SparseVector{Float64,Int64}}

julia> typeof(diagmat) <: Diagonal{Float64, <:Vector}
false

julia> typeof(diagmat) <: Diagonal{Float64, <:AbstractVector}
true

I would think that this is an overspecialization of this implementation of mul!, which could safely be fixed to accept Diagonal{Float64, <:AbstractVector}.

Below is a complete example monkey-patching SparseArrays in order to demonstrate that this would greatly improve performances (I’m on v1.1.0, hence the slightly different code w.r.t the version linked above):

using SparseArrays

# multiply by diagonal matrix as vector
@eval SparseArrays function mul!(C::SparseMatrixCSC, A::SparseMatrixCSC, D::Diagonal{T, <:AbstractVector}) where T
    m, n = size(A)
    b    = D.diag
    (n==length(b) && size(A)==size(C)) || throw(DimensionMismatch())
    copyinds!(C, A)
    Cnzval = C.nzval
    Anzval = A.nzval
    resize!(Cnzval, length(Anzval))
    for col = 1:n, p = A.colptr[col]:(A.colptr[col+1]-1)
        @inbounds Cnzval[p] = Anzval[p] * b[col]
    end
    C
end


using LinearAlgebra
using BenchmarkTools

nrows = 1000
spmat1 = sparse(1.0*I,nrows, nrows);
spmat2 = sparse(1.0*I,nrows, nrows);
diagmat = Diagonal(diag(spmat2));
@btime $spmat1*$diagmat;
@btime $spmat1*$spmat2;

yielding:

julia> @btime $spmat1*$diagmat;
  48.232 μs (4 allocations: 23.92 KiB)

julia> @btime $spmat1*$spmat2;
  19.026 μs (7 allocations: 61.11 KiB)

There is still some loss of performance w.r.t the SparseMatrix*SparseMatrix product, but at least the order of magnitude/complexity is now right.


Unless someone more knowledgeable chimes in to point out a flaw in the proposal above, I’ll try to submit a PR to fix this.

3 Likes

Your solution looks reasonable to me @ffevotte. Thanks for offering to submit a PR! Although I think there should continue to a be a specialized method for the case of multiplying sparse and diagonal matrices, the fact that very small oversights like this can lead to falling back to generic abstract array methods and give these absolutely massive performance hits is yet another argument for pursuing something like what @klacru proposed in https://github.com/JuliaLang/julia/pull/31563. It would be really great if when there was a method like this missing it fell back to a (sparse,AbstractArray) method, rather than an unuseably slow fully generic (AbstractArray, AbstractArray) method.

@klacru, any chance that 31563 or something like it will get resurrected anytime soon?

1 Like

I just resolved merging conflicts an hope for somebody to review this PR.

1 Like

I think so too.

I think the point here is to avoid over specializations. Here is another low-hanging fruit:

(Of course, solving the array wrapper problem is an issue that needs more dedicated solution.)

Loosening some of these over-specialized method signatures sounds like a good idea to me and would fix a good number of these falling back to generic abstract array method bugs. Even after taking that approach though, there could still be cases that are missed and fall back to dense array routines. My main point in the paragraph that you quoted was just that it would be good if there was a mechanism (maybe something like 31563, maybe something else) to catch the cases that fall through the cracks so that they fall back to (op)(Sparse, AbstractArray) methods instead of (op)(AbstractArray, AbstractArray) methods.

Perhaps that’s what you meant by

but I thought I’d clarify my point.

Edit: I should add that I really appreciate that folks like you @tkf, @klacru, @andreasnoack and others (apologies to the many I’ve missed) have been thinking hard about sparse arrays and sparse linear algebra in julia. I’ve been an occasional contributor to and frequent observer of the code base. Haven’t been able to think much about it lately but I’m glad that there are people taking this code seriously and keeping it moving forward.

1 Like

I totally agree that we need a better approach. Maybe a better type hierarchy and/or maybe some trait-based solution. Let me also mention that for highly overloaded functions like *, another problem is “ambiguity resolution hell”; you have to repeat redundant definitions to make julia happy. I hope that we can come up with a new solution that addresses all these problems.

1 Like
2 Likes