Solving a Shifted Linear System of 2 Symmetric Positive Semi Definte (SPSD) Matrices

I have the following linear system:

\left( \boldsymbol{A} + \lambda \boldsymbol{C} \right) \boldsymbol{x} = \boldsymbol{A}^{T} \boldsymbol{b}

Where \boldsymbol{A}, \boldsymbol{C} \in \mathcal{S}_{+}^{n} are Symmetric Positive Semi Definte (SPSD) Matrices.

I need to solve the system for various \lambda > 0 values.
Beside using LDL Decomposition on \boldsymbol{A} + \lambda \boldsymbol{C} or “Hot Iteration”, is there an efficient way to solve this?
Similar to the case \boldsymbol{C} = \boldsymbol{I}?

I can’t assume the matrices are similar.

Remark: Once could think on the above as the Normal Equations of a Tikhonov Regularized Least Squares:

\arg \min_{\boldsymbol{x}} \frac{1}{2} {\left\| \boldsymbol{D} \boldsymbol{x} - \boldsymbol{b} \right\|}_{2}^{2} + \frac{\lambda}{2} {\left\| \boldsymbol{E} \boldsymbol{x} \right\|}_{2}^{2}

Where \boldsymbol{A} = \boldsymbol{D}^{T} \boldsymbol{D} and \boldsymbol{C} = \boldsymbol{E}^{T} \boldsymbol{E}.
If there a specialized way to solve this for various \lambda values without the squared condition number, it would be even better.

There are a few options, but it is a harder problem and those options aren’t as nice as for A-\lambda I. I’m assuming that you aren’t trying to exploit sparsity. You could try treating this as a symmetric semi-definite generalized eigenvalue problem to simultaneously diagonalize A and C with a common congruence formed from generalized eigenvectors. The stability of that is not perfect, however. The eigenvector matrix can be ill-conditioned and if C is semidefinite instead of positive definite, you are not even guaranteed a complete set of eigenvectors.

What I have done before is to just to give up on exploiting symmetry. If you compute a Hessenberg-triangular form of A and C:

Q^T A Z = H, \qquad Q^T B Z = R

where H is upper Hessenberg and R is upper triangular. Q and Z are orthogonal. Your problem transforms to

(H + \lambda R) \mathbf{y} = Q^T A^T \mathbf{b}

where \mathbf{y} = Z^T \mathbf{x}. Each choice of \lambda then gives a Hessenberg system to solve, which is more efficient. The LAPACK routine for Hessenberg-triangular reduction is xGGHRD. Unfortunately, there doesn’t appear to be an existing Julia interface to that routine. But it probably isn’t too hard to write. I’ve found the existing Julia interfaces not that hard to follow as a template for accessing other LAPACK routines. Although, to be honest, last time I had a problem like this, I was too lazy to do that and just used schur(A,C), so it had to run the full QZ algorithm. It was mostly something I was doing for some test code and it slowed down the tests, but I could live with that.

There is a symmetry preserving reduction that will use a congruence to make A tridiagonal and C diagonal, which is a lot closer to the A-\lambda I case. But you give up on orthogonality of the congruence. There is a description in this paper. I’d trust the non-symmetric approach more for stability.

I don’t know what options there are that avoid condition squaring for the regularized least squares problem. I have actually thought a little about that before, but didn’t come up with anything and I haven’t seen anything published. I’m not sure that it’s reasonable to expect to do this. Using orthogonal transformations would automatically preserve symmetry when you form D^T D and E^T E and symmetry and orthogonality don’t seem to go together for this type of reduction. So you can probably do something, but I wouldn’t expect it to be fully based on orthogonal transformations and there might be stability concerns.

2 Likes

One option is to first Cholesky-factor C = U^T U, let y = U x, then rewrite the system as:

(\underbrace{U^{-T} A U^{-1}}_{B} + \lambda I) y = U^{-T} A^T b = c

Note that the matrix B is still SPSD. Note also that, as usual, you don’t actually invert any matrices: you compute B via triangular solves B = U' \ A / U.

Once you have it in this form, you can call hessenberg(Hermitian(B)) to factorize B into Q H Q^T, where H is symmetric tridiagonal. The advantage of this is that you can then do shifted solves without repeating the factorization, since B + \lambda I = Q (H + \lambda I) Q^T.

(Diagonalizing B into eigenvectors would also accomplish this, but is more expensive. On the other hand, diagonalization results in a faster solve step, so it depends on how many \lambda you need to solve with.)

Moreover, the Hessenberg factorization objects in Julia have efficient specialized support for shifted solves, for exactly this sort of application.

In short, you can do something like:

# pre-processing
U = cholesky(C).U # UpperTriangular
B = Hermitian(U' \ A / U)
c = U' \ (A'b)
F = hessenberg(B) # tridiagonal

# solve for each λ
λ = 0.1234
y = (F + λ*I) \ c
x = U \ y

# check:
@show x ≈ (A + λ*C) \ (A'b)  # should print "true"

Note that F + \lambda*I is actually lazy — it doesn’t even allocate a new matrix. And you can do the y and x solves in-place too, if you have pre-allocated vectors x and y:

ldiv!(y, F + λ*I, c)
ldiv!(x, U, y)

Usually if you are doing Tikhonov regularization, the resulting system should be well-conditioned, no?

If it’s ill-conditioned, you could equivalently solve the ordinary least-squares problem

\min_x \left\Vert \begin{pmatrix} D \\ \sqrt{\lambda} E \end{pmatrix} x - \begin{pmatrix} b \\ 0 \end{pmatrix} \right\Vert_2

by QR as usual (\ in Julia), but I’m not sure if there is a way to re-use calculations for multiple \lambda in that case.

2 Likes

Indeed. I meant mostly in case solving it directly and not using the Normal Equations can yield something even better.

The only issue with your solution is to handle the case the matrices \boldsymbol{A} and \boldsymbol{C} are not SPD and hence the Cholesky factorization is not feasible.
Unless you mean \boldsymbol{U} is derived using the Eigen Decomposition with the root of the diagonal matrix. But the then the assumption about the triangular won’t hold.

Could it be done with LDLt?

You mean C could be singular?

The whole point of Tikhonov is to choose an E such that C = E^T E is positive-definite, as that is what makes the regularization A + \lambda C work. That is, you should choose an E matrix with independent columns (full column rank). Otherwise your problem need not even have a unique answer, and you need to re-think what you are even doing.

As long as C is SPD and \lambda > 0, then it doesn’t matter if A is only SPSD.

If A and C could be arbitrary SPSD matrices, then you haven’t provided enough information to specify an answer. e.g. what x do you want to get if A = C = zeros(m,m)? Or is your problem somehow such that A + \lambda C is always nonsingular/SPD?

PS. If at least one of either A or C is SPD, you are fine: if C is singular but A is SPD, simply swap the two matrices by solving (C + \mu A)x = \mu A^T b where \mu = \lambda^{-1}, and apply my approach by Cholesky-factoring A instead of C.

I agree that in the usual case things are as you described.
Yet in my case, both are SPSD, namely in some cases both are indeed singular.

For the case one is SPD I was aware of the trick you mentioned, though with the Eigen Decomposition instead of the Hessenberg (Probably saw you posting it somewhere else in the forum).

My own reasons for wanting decompositions of this type was looking at error bounds for the symmetric semidefinite generalized eigenvalue problem and comparing a more stable algorithm to the standard Cholesky approach when C is ill-conditioned (and for the algorithm I was working on, possibly singular). It was an intermediate step in using inverse iteration to compute condition number estimates of A-\lambda B where \lambda is an approximate generalized eigenvalue to determine if \lambda had been computed in a way that gives a small backward error. I really needed to stick purely to orthogonal transformations to preserve the singular values of A-\lambda B for the comparison to be meaningful.

I did take the part about C being singular to heart, but it is certainly odd to see a linear system of this type where C is singular. It would be nice to know where it comes from.

1 Like

In my actual problem, in one of the steps I optimize something like:

\arg \min_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{E}_{1} \hat{\boldsymbol{A}} \boldsymbol{E}_{1} \boldsymbol{x} + \lambda \boldsymbol{x}^{T} \boldsymbol{E}_{2} \hat{\boldsymbol{C}} \boldsymbol{E}_{2}\boldsymbol{x}

Where the matrices \boldsymbol{E}_{1}, \boldsymbol{E}_{2} generate a weighted subset of \boldsymbol{x}. They might have zero columns. More than that, it might share same idnices of such columns.
So the above \boldsymbol{A} = \boldsymbol{E}_{1}^{T} \hat{\boldsymbol{A}} \boldsymbol{E}_{1}, \; \boldsymbol{C} = \boldsymbol{E}_{2}^{T} \hat{\boldsymbol{C}} \boldsymbol{E}_{2}.

By the nature of the matrices, they are not guaranteed to be SPD.

I can do some cherry picking of the elements, but the issue is this is a single step of multiple steps and it is nicer to keep this structure even if it means the matrices are SPSD and not SPD.

I just thought a similar trick to the one above can be employed in such case as well.

I have wanted to try this for a while and I did try setting up an interface to the Hessenberg-triangular form from LAPACK with the following:

using LinearAlgebra
using LinearAlgebra: checksquare, libblastrampoline
using LinearAlgebra.LAPACK: BlasInt, @chkvalidparam, @blasfunc, chkstride1, chklapackerror
using Base: require_one_based_indexing

for (gghd3, elty) in
    ((:dgghd3_,:Float64),
     (:sgghd3_,:Float32))
    @eval begin
        function gghd3!(jobq::AbstractChar, jobz::AbstractChar,
                        A::AbstractMatrix{$elty}, B::AbstractMatrix{$elty})
            require_one_based_indexing(A, B)
            @chkvalidparam 1 jobq ('N', 'I')
            @chkvalidparam 2 jobz ('N', 'I')
            chkstride1(A, B)
            n, m = checksquare(A, B)
            ilo = 1
            ihi = n
            if n != m
                throw(DimensionMismatch(
                    lazy"dimensions of A, ($n,$n), and B, ($m,$m), must match"))
            end
            ldq = jobq == 'I' ? max(1, n) : 1
            q = similar(A, $elty, ldq, n)
            ldz = jobz == 'I' ? max(1, n) : 1
            z = similar(A, $elty, ldz, n)
            work = Vector{$elty}(undef, 1)
            lwork = BlasInt(-1)
            info = Ref{BlasInt}()
            for i = 1:2  # first call returns lwork as work[1]
                ccall((@blasfunc($gghd3), libblastrampoline), Cvoid,
                      (Ref{UInt8}, Ref{UInt8}, #compq, compz
                       Ref{BlasInt}, Ref{BlasInt}, Ref{BlasInt},  # n, ilo, ihi
                       Ptr{$elty}, Ref{BlasInt}, Ptr{$elty}, # A, lda, B
                       Ref{BlasInt}, Ptr{$elty}, Ref{BlasInt}, # ldb, Q, ldq
                       Ptr{$elty}, Ref{BlasInt}, Ptr{$elty}, # Z, ldz, work
                       Ref{BlasInt}, Ref{BlasInt}, # lwork, info
                       Clong, Clong, Clong),
                      jobq, jobz,
                      n, ilo, ihi, A, max(1,stride(A, 2)), B,
                      max(1,stride(B, 2)), q, ldq, z,
                      ldz, work, lwork, info, 1, 1, 1)
                chklapackerror(info[])
                if i == 1
                    lwork = BlasInt(real(work[1]))
                    resize!(work, lwork)
                end
            end
            UpperHessenberg(A), UpperTriangular(B),
            q[1:(jobq == 'I' ? n : 0),:], z[1:(jobz == 'I' ? n : 0),:]
        end
    end
end


function hess_triangular_form(A,B)
    Q1, R = qr(B)
    Ta, Tb, Q, Z = gghd3!('I', 'I', Q1'*A, R)
    return Ta, Tb, Q1*Q, Z
end

This can be used to solve the system with something like

using LinearAlgebra

n=100
A = randn(n,n)
B = randn(n,n)
Ta, Tb, Q, Z = hess_triangular_form(A,B)

@show opnorm(A-Q*Ta*Z', Inf)
@show opnorm(B-Q*Tb*Z', Inf)

b = randn(n)
y = Q'*A'*copy(b)
λ = 7.0
ldiv!(UpperHessenberg(Ta + λ * Tb), y)
x = Z*y
@show norm((A + λ * B) * x - A'*b)

which gives output:

opnorm(A - Q * Ta * Z', Inf) = 2.580362139237291e-13
opnorm(B - Q * Tb * Z', Inf) = 2.2466750682070824e-13
norm((A + λ * B) * x - A' * b) = 7.710934030831613e-13

Solving should be O(n^2) once the Hessenberg-triangular form is computed with cost O(n^3). The built-in schur(A,B) in Julia goes a bit further than Hessenberg-triangular form and makes A quasi-triangular (with at most 2\times 2 blocks on the diagonal) and B triangular. It can be used in the same way, since it’s a more structured Hessenberg-triangular form. I thought the basic Hessenberg triangular form would be a substantial performance improvement over going to the full Schur form, but again with random A and B, for n=1000 I get

julia> @benchmark hess_triangular_form(A,B)
BenchmarkTools.Trial: 2 samples with 1 evaluation per sample.
 Range (min … max):  3.845 s …   3.933 s  ┊ GC (min … max): 0.04% … 0.00%
 Time  (median):     3.889 s              ┊ GC (median):    0.02%
 Time  (mean ± σ):   3.889 s ± 62.658 ms  ┊ GC (mean ± σ):  0.02% ± 0.03%

and

julia> @benchmark schur(A,B)
BenchmarkTools.Trial: 2 samples with 1 evaluation per sample.
 Range (min … max):  4.608 s …   4.658 s  ┊ GC (min … max): 0.00% … 0.02%
 Time  (median):     4.633 s              ┊ GC (median):    0.01%
 Time  (mean ± σ):   4.633 s ± 35.495 ms  ┊ GC (mean ± σ):  0.01% ± 0.01%

(Note in the above numbers, the computer I’m running this on is ancient.)
So it appears to be pretty marginal. Profiling suggests that hess_triangular_form is spending most of its time in the ccall, so I don’t think the allocations and the simple set-up of the problem is taking much time. I’m a bit surprised, but maybe the Hessenberg-triangular form is a larger percentage of the schur run time than I thought it was. I was expecting half or less.