How to efficiently update a Sparse Matrix (Diagonal + Sparse) in-place inside a hot loop?

Hi everyone,

I am working on an iterative algorithm where I need to construct and solve a linear system inside a “hot loop”.

Currently, my code allocates a new sparse matrix A at every iteration because of the operation between a Diagonal matrix and a SparseMatrix.

something like:

using LinearAlgebra, SparseArrays,  BenchmarkTools

function mwe(y::Vector{T},  max_iter::Int=10) where T
    m = length(y)
    
    D = spdiagm(0 => -2.0 * ones(T, m), 1 => ones(T, m-1), -1 => ones(T, m-1))
    AA = D' * D
    
    # Buffers
    w = ones(T, m)
    tmp_rhs = similar(y)
    result = similar(y)
    residuals = similar(y)

    for _ in 1:max_iter
        
        A = Diagonal(w) * AA

       
        tmp_rhs .= w .* y
        result .= A \ tmp_rhs

       # not used in mwe
        residuals .= y .- result
        w .= 1.0 ./ (1.0 .+ abs.(residuals)) 
    end
    
    return result
end

Bonus point: is possible to resolve the linear sistem in-place into the results vector ?
How can I do that?

Just to clarify, inside your code I see a multiplication between diagonal and sparse, not an addition. Which is it?

typo sorry, I mean the multiplication.

But, mainly, I would like to solve the linear sistem in place directly into results vector.

You can multiply by a diagonal matrix in-place by looping over rowvals and nonzeros, just multiplying each element by the diagonal element of the corresponding row. In particular, allocate A = copy(AA) outside of the loop, and then do:

for i in eachindex(nonzeros(A))
    nonzeros(A)[i] = w[rowvals(A)[i]] * nonzeros(AA)[i]
end

or, more compactly:

nonzeros(A) .= getindex.(Ref(w), rowvals(A)) .* nonzeros(AA) # A = Diagonal(w) * AA

You need to:

  1. Allocate the LU factorization F = lu(A) outside of the loop
  2. Re-use this to compute lu!(F, A) in-place in the loop (still has a few allocations)
  3. use ldiv!(result, F, tmp_rhs) to use the factorization to solve the system in-place.

However, if you care about efficiency, there are probably other things that will have a bigger impact. Allocations aren’t usually the slowest part of solving large sparse systems. Instead, I would think about the following issues:

  1. Your matrix A is banded. You might want to exploit this structure instead of using generic sparse matrices, by using BandedMatrices.jl
  2. You are computing result = (W * AA) \ (W * y) where W = Diagonal(w). But this is equivalent to computing AA \ y. Why are you multiplying by W at all? Is this a typo?
  3. Assuming (2) is a typo, you still don’t need to update your sparse matrix. (W \texttt{AA})^{-1} = \texttt{AA}^{-1} W^{-1}, so you can simply divide by W first (equivalent to dividing elementwise by w) and then left-divide by AA. Since AA is not changing at all in your loop, you can compute its LU factorization F = lu(AA) once and re-use it.
  4. The matrix AA = D' * D is symmetric positive-definite, so rather than LU factorization, you can take advantage of this structure by AA = Symmetric(D' * D) and then using the F = cholesky(AA) factorization instead (more than twice as fast).

When you are doing large-scale linear algebra, you mainly want to think about the overall algebraic structure and the linear algebra, rather than trying to micro-optimize your loops and allocations.

2 Likes