Solving shifted linear system

I would like to solve a large shifted linear system: I have to solve (A + k^2 I)x = y, where A is symmetric PD, y is a vector, k is a scalar, and I is identity matrix. Matrix A is large O(10000 x 10000) but it is banded where bandwidth is relatively small. I can get the Cholesky factor of A, but I don’t know how to make use of this to solve the shifted linear system.

Are there any Julia implementation of Krylov space algorithms (e.g., CG, GMRES) for efficiently solving large shifted linear systems?

Thanks for any help.
Ravi

See Linear System Solvers · LinearSolve.jl. It supports CG, GMRES, and pretty much everything else. Note that for banded matrices, you likely want to be using BandedMatrices.jl

I assume you need to solve it for many different shifts k^2? (Otherwise, you could just compute the Cholesky factorization of A + k^2 I, or use some other solver technique on this matrix.)

One option is to use a Hessenberg factorization A = Q H Q^* of A, which is advantageous because it immediately gives you the Hessenberg factorization of A + k^2 I = Q (H + k^2 I) Q^*, and given a Hessenberg factorization you can solve linear systems quickly. If A is Hermitian, then H is tridiagonal.

Good news: Julia includes a hessenberg function to compute Hessenberg factorizations, and includes routines to efficiently solve shifted linear systems given this Hessenberg factorization (added in julia#31853).

Bad news: Julia’s current Hessenberg factorization is based on LAPACK functionality for dense matrices, and does not exploit banded-matrix structure. In principle, I think you should be able to implement a much faster (linear complexity?) Hessenberg factorization in this case using Givens rotations. Because Julia allows you to write fast inner-loop code, it’s quite reasonable to think about implementing this yourself.

I don’t know of a good way to use Cholesky factors of A to solve shifted linear systems with A + k^2 I, in general. If the shift k^2 is fairly small, however, you might use the Cholesky factorization of A as a preconditioner for an iterative linear solver. (Since this problem is symmetric positive-definite, the best iterative method is probably preconditioned CG, which can be found in LinearSolve.jl or IterativeSolvers.jl, for example.)

It’s not obvious if this will be faster than just re-factorizing A + k^2 I for each k — for banded matrices, sparse linear solvers are pretty fast (linear time). You’ll have to experiment and see.

Note that the BandedMatrices.jl package provides specialized direct solvers for banded matrices, which should be more efficient than using a generic sparse solve. Again, because banded solves are so fast (for small bandwidth), the first resort is the simplest approach of just re-solving for each k.

3 Likes

BandedMatrices.jl has a symmetric tridiagonalization routine BandedMatrices.tridiagonalize, which is exactly this linear complexity reduction. However it doesn’t seem to return the rotations, which are needed to solve the system. It wouldn’t be too hard to modify it to return rotations. But if the bandwidth is really fairly small, I’d be a little surprised if it were much faster than just recomputing the banded Cholesky for the full shifted band matrix for every k.

2 Likes

Interestingly, LAPACK itself contains such a routine dsbtrd (which is called by BandedMatrices.jl), but it also doesn’t return the Givens rotations (it only lets you compute a dense Q matrix, grrr).

It would be nice to implement a hessenberg method for BandedMatrix that uses the same API as the dense methods (returning Q implicitly as a product of rotations).

2 Likes

Indeed. I knew this had to be in LAPACK somwhere and had found dsbtrd. But then I noticed the routine in BandedMatrices.jl with a Julia fallback for non-BlasReal/BlasComplex types. It is a shame that no one seems to want to share the rotations. There is more to be gained by providing them for banded matrices than for dense, if the alternative is to form Q. I actually spent some time looking through the dsbtrd code to see if I could see where the rotations might have overwritten the matrix or the workspace. They are probably there somewhere, but that way madness lies.

Dear Steven and Mike,

Thank you very much for answering my query. This is helpful and yet I have to confess that I am unclear. I have never used Hessenberg factorization before. I know the spectral and SVD decompositions where there is the similar idea that the diagonal shift is absorbed in the middle diagonal matrix. Leaving aside the issue of how to perform efficient Hessenberg factorization for a banded matrix, my question is how do I use the Hessenberg factorization to solve the shifted linear system? A toy example in Julia would be very helpful.

Thank you,
Ravi

Given that you said your bandwidth is relatively small, I’d just go with banded Cholesky. Something like

using LinearAlgebra
using BandedMatrices

n=10
B = brandn(Float64, n, n, 1,1)
B = B*B' # Positive definite, lowerbw=upperbw=2
F = cholesky(Hermitian(B))
b = randn(n)
x = F \ b
@show norm(B*x-b,Inf)/opnorm(B,Inf)/norm(x, Inf)

If the bandwidth is independent of n, this is an O(n) algorithm.

The discussion about doing Hessenberg or tridiagonal decompositions is really of more use for large bandwidth or even for general dense matrices. Since your matrix is symmetric, you would want to go with tridiagonal. The idea is that if you factor

B = QTQ^T

where T is tridiagonal and Q is orthogonal, then

B + k^2 I = QTQ^T + k^2 QQ^T = Q (T+k^2 I) Q^T.

Solving (B+k^2 I) x = b is then equivalent to solving

(T+k^2 I) (Q^T x) = Q^T b

Cholesky factorization of T +k^2 I is O(n), regardless of any band structure (or lack thereof) in B. Solving for Q^T x is O(n) after you compute Q^T b. But multiplying by Q^T is O(n^2), so for small bandwidth B, this is actually slower than computing a new Cholesky factorization on each shifted B. It’s a big win for general dense matrices in which you have to apply lots of shfits because you pay the price of tridiagonal reduction once and from then on what was O(n^3) becomes an O(n^2) process for each shift, with the main cost being multiplication by Q and Q^T. If you can get the rotations used to form Q, then you can reduce it back to O(n) in the banded case, but neither LAPACK nor BandedMatrices.jl give you the rotations, which is unfortunate. And, in any event it’s not likely to be faster than just doing Cholesky repeatedly on B+k^2 I unless the bandwidth is large. It’s a nice trick to know about, but I wouldn’t try to go down this road if your bandwidth is small relative to n. The Hessenberg factorization allows a similar trick for the nonsymmetric case.

2 Likes

You don’t even need an explicit Cholesky factorization, since for symmetric tridiagonal matrices I seem to recall that we have a T \ b method that works directly with T (and is also O(n)). In fact, there is an ldiv!(T, b; shift=µ) method that computes (T + µ*I) \ b in-place in b. Update: checking the source code it seems that it does form an ldlt factorization explicitly, however, though it is still true that it is O(n); in principle this could probably be elided into the solve.

If you compute a Hessenberg factorization object F = hessenberg(A) in Julia, this is exploited under the hood. As explained in the documentation it directly supports shifts: (F + µ*I) \ b or ldiv!(F + µ*I, b) is very cheap (O(n) for Hermitian matrices, and O(n^2) for general matrices).

What is (currently) missing is a BandedMatrices.jl method that creates a Hessenberg factorization object from a banded matrix, with an implicit representation of Q as a product of Givens rotations. This should be straightforward to add, in principle, since much of the code is already written.

The Hessenberg approach could be an especially big win if you are solving for different shifts with the same right-hand side, since in that case you may only need to apply the Q transformation once and then just do tridiagonal solves for each shift.

Finding the Hessenberg factorization is significantly cheaper than computing the diagonalization of the matrix — in fact, the first step of diagonalization is usually a Hessenberg factorization, so this saves you the subsequent iterative diagonalization steps.

(For general square matrices, the analogous thing to diagonalization for shifted solves would typically be a Schur factorization, not an SVD, but again Hessenberg is the first step of a Schur calculation so working with Hessenberg is cheaper.)

But again, for banded matrices with small bandwidth, every reasonable algorithm is linear O(n) time, including just re-computing the Cholesky factorization for each shift — so it boils down to a battle of the constant factors and which method is more optimized. It’s not obvious a priori which method will “win”, but on the other hand they are all so fast that it may not matter much.

1 Like

That’s not ideal in terms of stability. It looks like that is currently using LDL^T with no pivoting and no blocking so that D is really diagonal without any 2\times 2 blocks. That can be unstable:

A = SymTridiagonal([
  3e-10 1 0
  1     2 1 
  0     1 2
])

F = ldlt(A)
display(A - F.L * F.D * F.L')

gives

3×3 Matrix{Float64}:
 0.0  0.0         0.0
 0.0  4.76837e-7  0.0
 0.0  0.0         0.0

There’s a blocking strategy due to Bunch that doesn’t do any interchanges, but does use 2\times 2 block pivots and was proven to be stable by N. Higham in this paper. It’s still linear complexity. Edit: L and D do require more storage in the stable algorithm, so ldlt! isn’t something that can be done fully in-place, but it seems worth it.

For a 5000x5000 tridiagonal matrix, performing Cholesky on shifted matrix is much much faster than computing Hessenberg factorization (as has already been noted by Steven). It seems that even if a banded Hessenberg factorization were available, it may not be as fast Cholesky. This may be true even if the shifted system has to be inverted several times.

Ravi

That’s because there (currently) is no specialized Hessenberg factorization for banded matrices, so this comparison is pointless at the moment — you are comparing an O(n^3) algorithm to an O(n) algorithm. You misunderstood me — my suggestion requires first implementing banded Hessenberg.

(Implementing an optimized Hessenberg factorization for tridiagonal matrices is easy, albeit somewhat pointless — it’s the identity operation, since tridiagonal matrices are already in upper-Hessenberg form.)

Yes, I got it. Thanks.

Do you agree with the claim that solving the shifted system using Cholesky might still be faster than banded Hessenberg factorization?

On a somewhat different note, what do you think about using Krylov space iterative solvers for the shifted system? It seems that they might be more useful for a dense SPD matrix. Am I correct?

Ravi

Yes, since both are O(n), it’s hard to say a priori which will be faster.

(Unless you are using the same right-hand-side with different shifts, so that you can re-use the action of Q on the vector(s). For example, if you are computing x^T (A + \mu I)^{-1} y = (Q^T x)^T (H + \mu I)^{-1} (Q^T y) for multiple shifts \mu but with the same x and y, then you can compute the Q^T multiplications once, and each subsequent shift will only require a tridiagonal solve with the H + \mu I factor. This would be a definite win over re-running a Cholesky factorization for each shift on a banded matrix, assuming a bandwidth > tridiagonal.)

I might be missing something, but if \boldsymbol{A} is an SPD (Symmetric Positive Definite) matrix, then you have its eigen decomposition:

\boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Lambda} \boldsymbol{U}^{T}

Where \boldsymbol{\Lambda} is a diagonal matrix and \boldsymbol{U} is orthogonal matrix. Then:

\begin{aligned} \left( \boldsymbol{A} + \alpha \boldsymbol{I} \right) \boldsymbol{x} & = \left( \boldsymbol{U} \boldsymbol{\Lambda} \boldsymbol{U}^{T} + \alpha \boldsymbol{I} \right) \boldsymbol{x} \\ & = \boldsymbol{U} \left( \boldsymbol{\Lambda} + \alpha \boldsymbol{I} \right) \boldsymbol{U}^{T} \boldsymbol{x} \\ & = \boldsymbol{U} \boldsymbol{D}_{\alpha} \boldsymbol{U}^{T} \boldsymbol{x} \end{aligned}

Where \boldsymbol{D}_{\alpha} is a diagonal matrix with {D}_{ii} = {\lambda}_{i} + \alpha where {\lambda}_{i} is the i -th eigen value of \boldsymbol{A}.

Plugging it into the linear system:

\boldsymbol{U} \boldsymbol{D}_{\alpha} \boldsymbol{U}^{T} \hat{\boldsymbol{x}} = \boldsymbol{y} \iff \boldsymbol{D}_{\alpha} \boldsymbol{U}^{T} \hat{\boldsymbol{x}} = \boldsymbol{U}^{T} \boldsymbol{y}

Hence:

\hat{\boldsymbol{x}} = \boldsymbol{U} \boldsymbol{D}_{\alpha}^{-1} \boldsymbol{U}^{T} \boldsymbol{y}

Now for each update all needed is to recalculate \boldsymbol{D}_{\alpha} and mutliply by orthogonal matrices.

This will work also for SPSD matrices as long as \alpha > 0.

Update

It will work only if you can hold the matrix \boldsymbol{U} memory wise (And do the eigen decomposition).

Yes, you can also use the diagonalization for shifted solves (or the Schur form for non-Hermitian matrices). But, as I commented above, Hessenberg factorization is cheaper than diagonalization — in fact, a Hessenberg factorization is the first step of typical diagonalization algorithms.

For example:

julia> using LinearAlgebra, BenchmarkTools

julia> A = rand(1000,1000); A = Symmetric(A'A); # random SPD matrix;

julia> @btime eigen($A);
  85.904 ms (13 allocations: 23.25 MiB)

julia> @btime hessenberg($A);
  27.752 ms (7 allocations: 7.90 MiB)

@stevengj , Can’t argue with Math and when it is backed by empirical metrics as you showed.

Yet the solution to the above problem is 2 steps solution:

  1. The one time decomposition.
  2. The solution per \alpha.

For (1) going the path you suggested is clearly faster.
Do you agree that for (2) the Eigen Decomposition approach will be faster?

Let’s say (2) is done N times.

So in order to balance (1) one need N to be large enough so the gain (Smaller) in (2) will balance the deficit in (1).

This probably has to do to with the dimensions of the matrix and the value of N.
The OP should try on his own problem and make the right choice.

Depends. I agree that a diagonal solve is faster than a tridiagonal solve (although both of them might be so fast that other parts of the computation dominate, depending on what else you are doing).

But there is also the representation of Q, especially if this must be applied at each shift (if your vectors are changing too, not just the shift). For dense matrices, I don’t think there is any particular advantage one way or the other. But if the matrix is banded (as the original poster’s is), and you implemented banded Hessenberg (which so far has not been done, but hopefully will be some day), then the Hessenberg Q can be expressed as an O(n) product of Givens rotations. Whereas the diagonalization’s Q is significantly more expensive (since to get a sparse representation you’d need to store all the steps of your QR algorithm or whatever diagonalization algorithm you are using).

Actually, it looks like they have an internal tridiagonalization routine that exposes the Givens rotations.

1 Like

@rvaradhan
I have a special implementation of CG in Krylov.jl dedicated to multiple shifts: Hermitian positive definite linear systems · Krylov.jl

If the convergence is slow, you can use the Cholesky decomposition of A (without shift) as a preconditioner.

2 Likes