I am working on an iterative algorithm where I need to construct and solve a linear system inside a “hot loop”.
Currently, my code allocates a new sparse matrix A at every iteration because of the operation between a Diagonal matrix and a SparseMatrix.
something like:
using LinearAlgebra, SparseArrays, BenchmarkTools
function mwe(y::Vector{T}, max_iter::Int=10) where T
m = length(y)
D = spdiagm(0 => -2.0 * ones(T, m), 1 => ones(T, m-1), -1 => ones(T, m-1))
AA = D' * D
# Buffers
w = ones(T, m)
tmp_rhs = similar(y)
result = similar(y)
residuals = similar(y)
for _ in 1:max_iter
A = Diagonal(w) * AA
tmp_rhs .= w .* y
result .= A \ tmp_rhs
# not used in mwe
residuals .= y .- result
w .= 1.0 ./ (1.0 .+ abs.(residuals))
end
return result
end
Bonus point: is possible to resolve the linear sistem in-place into the results vector ?
How can I do that?
You can multiply by a diagonal matrix in-place by looping over rowvals and nonzeros, just multiplying each element by the diagonal element of the corresponding row. In particular, allocate A = copy(AA) outside of the loop, and then do:
for i in eachindex(nonzeros(A))
nonzeros(A)[i] = w[rowvals(A)[i]] * nonzeros(AA)[i]
end
or, more compactly:
nonzeros(A) .= getindex.(Ref(w), rowvals(A)) .* nonzeros(AA) # A = Diagonal(w) * AA
You need to:
Allocate the LU factorization F = lu(A) outside of the loop
Re-use this to compute lu!(F, A) in-place in the loop (still has a few allocations)
use ldiv!(result, F, tmp_rhs) to use the factorization to solve the system in-place.
However, if you care about efficiency, there are probably other things that will have a bigger impact. Allocations aren’t usually the slowest part of solving large sparse systems. Instead, I would think about the following issues:
Your matrix A is banded. You might want to exploit this structure instead of using generic sparse matrices, by using BandedMatrices.jl
You are computing result = (W * AA) \ (W * y) where W = Diagonal(w). But this is equivalent to computing AA \ y. Why are you multiplying by W at all? Is this a typo?
Assuming (2) is a typo, you still don’t need to update your sparse matrix. (W \texttt{AA})^{-1} = \texttt{AA}^{-1} W^{-1}, so you can simply divide by W first (equivalent to dividing elementwise by w) and then left-divide by AA. Since AA is not changing at all in your loop, you can compute its LU factorization F = lu(AA)once and re-use it.
The matrix AA = D' * D is symmetric positive-definite, so rather than LU factorization, you can take advantage of this structure by AA = Symmetric(D' * D) and then using the F = cholesky(AA) factorization instead (more than twice as fast).
When you are doing large-scale linear algebra, you mainly want to think about the overall algebraic structure and the linear algebra, rather than trying to micro-optimize your loops and allocations.