Solve linear system under Hermitian constraint

Is there a nice way to solve the linear system XA = B under the constraint X = X^\dagger? Here A and B are complex matrices that are not necessarily square.

What I came up with is to solve YA = B and A^\dagger Y = B^\dagger, and let X = Y + Y^\dagger. Now to solve this I used the identities

\operatorname{vec}(MN) = I \otimes M \operatorname{vec}(N) = N^T \otimes I \operatorname{vec}(M)

which allow me to map the system into

\begin{pmatrix} A^T \otimes I \\ I \otimes A^\dagger \end{pmatrix} \operatorname{vec}(Y) = \begin{pmatrix} \operatorname{vec}(B) \\ \operatorname{vec}(B^\dagger) \end{pmatrix}

This works, but I can’t help but feel that I’m being stupid and there must be a more elegant way. I couldn’t find a solver for this kind of problem in MatrixEquations.jl

1 Like

There may be a more clever method out there, and if there is, I bet someone here can chime in with it.

But for what it’s worth as a sanity check, the technique you’re using here doesn’t seem crazy or even too inefficient (assuming you’re not actually materializing the full matrix with the direct Kronecker products of A^T and A^\dagger with identity matrices)

I looked around a bit, and found this MathOverflow discussion with the same problem except symmetric, and they seem to be using a equivalent(?) technique.

1 Like

This doesn’t necessarily have a unique solution, depending on the sizes of A and B, no? Which solution do you want?

As @Mason alluded to, Kronecker products to solve for an m \times m matrix works, but if you materialize the matrices and use a dense solver they are usually pretty slow — they transform O(m^3) problems into O(m^6) problems. If you use an iterative solver and apply the Kronecker products implicitly through the tensor-product identity, this issue is avoided, but iterative methods have their own complications.

1 Like

Presumably any least squares solution which satisfies the Hermitian constraint.

What do you mean by a “least-squares solution”? If it’s overdetermined (if A,B are too wide), then it has no exact solution.

Do you mean that they want a minimum-norm solution if it’s underdetermined (if A,B are too narrow)? Or just any solution at all if it’s underdetermined?

For the application I care about the problem is usually underdetermined, and any solution is fine. If it’s overdetermined it means something went terribly wrong, and the solution doesn’t matter.

It’s not obvious to me that this has a solution, without additional constraints on B?

e.g. for m,n = 100,2 and A, B = randn(m,n), randn(m,n), your matrix M = [kron(A', I(m)); kron(I(m), A')] has rank of only 396 (4 singular values of zero), and [vec(B); vec(B')] is not in the column space for a random B (so there is no exact solution for Y).

(It does always have solutions for n=1. In this case, however, you can do even better: there is a closed-form minimum-norm solution that you can derive via Lagrange multipliers similar to BFGS updates.)

If it works, I’m guessing that you obtain your B matrix in a way that is related to A in some way. How?

(Of course, a solution exists if I let B = HA for some Hermitian H, but if you got it this way then presumably you could just use X=H.)

PS. Your equation is equivalent to a “Sylvester–transpose equation”. Most of the papers I’ve come across solve this kind of equation only approximately, in the least-square sense, which makes sense if there is generally no exact solution, but maybe a deeper literature search will turn up simple consistency criteria.

Indeed, there is no solution for general A and B. In my case, though, these matrices do have a special structure, they come from moment matrices used in non-commutative polynomial optimization, and there’s a theorem by Helton that guarantees the existence of a solution. I wrote about it here. Unfortunately his theorem is of absolutely no help in finding the solution, which is why I’m here.

The MathOverflow discussion @Mason linked has a nice reduction to the Lyapunov equation in the real case, which MatrixEquations.jl can handle. I’m trying to adapt it to the complex case now.

For reference, I applied the same technique as in those notes for the m \times m complex matrix X = X^\dagger with m \times n constraints XA = B (using CR calculus to handle the complex derivatives), and found that you can reduce it to a smaller m \times n Sylvester-transpose equation for the minimum-norm solution (minimizing \Vert X \Vert_F), if it exists:

X=A \Lambda^\dagger + \Lambda A^\dagger

where \Lambda is an m \times n matrix (of Lagrange multipliers) satisfying the Sylvester-transpose equation:

A \Lambda^\dagger A + \Lambda A^\dagger A = B

This equation does not have a solution for a general right-hand side B, but if you have consistent A, B you can solve it quickly by an iterative method. For example:

using LinearAlgebra, KrylovKit

# generate consistent data
m,n = 1000, 123
A = randn(ComplexF64, m,n); B = hermitianpart!(randn(ComplexF64,m,m)) * A;

λ, = linsolve(y -> let Y = reshape(y,m,n); vec(A*(Y'*A)+Y*(A'*A)); end, vec(B));

Λ = reshape(λ, m, n); X = A*Λ' + Λ*A';

gives a solution (verified by small backwards error) very quickly with the GMRES implementation in KrylovKit.jl (for me, it converged in 39 matvecs in the example above).

This has the advantage of being a smaller system (mn unknowns rather than m^2), assuming n \ll m, and the complexity of each matvec (each iterative step) is O(mn^2) rather than O(m^2 n) compared to iteratively solving either your equation or the smaller Sylvester-transpose system YA + Y^\dagger A = B. (And explicitly forming the Kronecker-product matrices is vastly more expensive, of course.)

PS. I was a little surprised in retrospect that KrylovKit doesn’t get confused by the complex conjugations in my operator? It’s not strictly a linear operator unless you reinterpret the input/output vectors in terms of real and imaginary parts, though you can easily do this of course. Maybe the standard GMRES method is actually equivalent to this, however(?), since least-squares and orthogonality in the complex-vector sense maps exactly onto least-squares and orthogonality for a vector of real/imaginary parts. Might be safer to use a reinterpret to map between complex vectors and real vectors of twice the length.

2 Likes

It looks to me like their derivation is incorrect? They have:

It seems like they lost a transpose in this step — the second equation should instead be:

\underbrace{X^\top}_\text{note!} (A^\top A) + (A^\top A) X = A^\top B + B^\top A

which is (again) a Sylvester-transpose equation, not a standard Sylvester or Lyapunov equation, and MatrixEquations appears not to handle this.

Correction (see below): It doesn’t matter, since X = X^\top.

I did notice that this transpose was missing, but it doesn’t matter, because the Lyapunov solver will only give you symmetric solutions anyway.

Oh, right, carry on.

It does make me wonder if MatrixSolvers.jl is doing something wrong though. I don’t see anything in the derivation that should matter if A or B are complex or real (other than replacing transpose with hermitian adjoint), yet I seem to fail to get solutions for the complex case:

julia> using MatrixEquations

julia> let N = 2, M = 2
           A = rand(ComplexF64, N, M)
           B = rand(ComplexF64, N, M)
           
           X = lyapc(A'A, -(A'B + B'A))
           @info "" A*X B
       end
┌ Info: 
│   A * X =
│    2×2 Matrix{ComplexF64}:
│     0.509246+0.470237im  0.379522+0.025039im
│     0.510879+0.698692im  0.482466+0.151896im
│   B =
│    2×2 Matrix{ComplexF64}:
│     0.588515+0.877156im  0.481588+0.48752im
â””     0.160942+0.620802im  0.379733+0.156699im

Am I just doing something dumb here?

I notice though that for real equations the agreement is better, but still quite bad:


julia> let N = 2, M = 2
           A = rand(Float64, N, M)
           B = rand(Float64, N, M)
           
           X = lyapc(A'A, -(A'B + B'A))
           @info "" A*X B
       end
┌ Info: 
│   A * X =
│    2×2 Matrix{Float64}:
│     0.659881  0.741435
│     0.168498  0.284469
│   B =
│    2×2 Matrix{Float64}:
│     0.643583  0.66734
â””     0.345046  0.41082

So maybe the algorithm in MatrixEquations.jl isn’t well suited to this?

I think it carries through straightforwardly with CR calculus. You are minimizing \Vert XA - B \Vert_F subject to X = X^\dagger, and you end up with the (singular) Lyapunov equation:

X(AA^\dagger) + (AA^\dagger)X = BA^\dagger + AB^\dagger

In principle, this corresponds to

X = lyap(Hermitian(A*A'), hermitianpart(-2B*A'))

in the LinearAlgebra stdlib, but that function does not like the fact that AA^\dagger is singular. But

X = MatrixEquations.lyapc(Hermitian(A*A'), hermitianpart(-2B*A'))

seems to work, and gives a small backwards error in my random 1000 \times 123 example from above:

julia> norm(X*A - B) / norm(B)
2.6016604106556882e-11

(Would be good to check on exactly what it is doing with the singularity of these equations, internally, to make sure you don’t run into a nasty surprise at some point. By comparison to my previous solution, it doesn’t seem to be finding the minimum-norm X.)

However, note that this method is O(m^3), whereas the minimum-norm iterative solution from above is O(mn^2) (modulo convergence rate). This might matter for large m.

I did test lyapc, it works fine when the solution exists:

A = randn(ComplexF64, 3, 2); H = Hermitian(randn(ComplexF64, 3, 3)); B = H*A;
X = lyapc(A*A', -(A*B'+B*A'))

I’m not confident about the derivation, though, because of the complex derivatives.

1 Like

The key points are that the Lagrangian becomes L(X, \Lambda) = \Vert AX - B \Vert_{\text{F}}^2 + \Re \langle \Lambda, X - X^\dagger \rangle_{\text{F}} and that you then use CR (+ matrix) calculus to get the equation \nabla_{X^\dagger} L = 0 = XAA^\dagger - BA^\dagger + \frac{1}{2}(\Lambda - \Lambda^\dagger). It follows that XAA^\dagger - BA^\dagger is skew-Hermitian and the remaining algebra goes through.

2 Likes

Ah duh, I forgot it was already established that the solution doesn’t always exist.

The derivatives of the Lagrangian? those are just derivatives of a complex to real system, so the math works out the same. When unsure, you can always just map

x + i y

to

[ x y
 -y x]

and do the derivatives on that. This will map your Hermitian matrices to symmetric matrices with the same algebraic properties (but some not so nice properties twice as many eigenvalues, and being less conditioned).

You have to be a little careful here, I think, because you need to stick a real part in front of the inner product with the Lagrange multiplier (in order for duality to make sense). But yes, with that caveat, and a little care, it essentially just goes through. CR calculus seems like magic (or an abomination unto the rules you learned in complex analysis), until it’s explained, though — it always seemed a shame to me that so few students are exposed to it, even in a complex-analysis course.

1 Like

could you expand on this? It seems that if you wrote \mathrm{Re}(\langle \Lambda, X - X^\dagger\rangle) instead of \langle \Lambda, X - X^\dagger\rangle you’d lose half your Lagrange multipliers, so we’d only fix half as many degrees of freedom as we needed to.

Remember, first, that Lagrange duality comes from minimizing the Lagrangian over X and maximizing over \Lambda, i.e. saddle points of the Lagrangian, which only makes sense if the Lagrangian is real.

Furthermore, we don’t lose anything by taking the real part, since \operatorname{Re} \langle \Lambda, C \rangle = \operatorname{Im} \langle -i\Lambda, C \rangle: so, finding the extremum over all \Lambda effectively looks at both the real and imaginary parts (because for each \Lambda there is also a -i\Lambda).

Another way of thinking about it is to realize that \operatorname{Re} \langle \Lambda, C \rangle = \langle \operatorname{Re} \Lambda, \operatorname{Re} C \rangle + \langle \operatorname{Im} \Lambda, \operatorname{Im} C \rangle, so it is equivalent to the ordinary inner product if you view complex numbers as just sub-vectors of two real components, i.e. if you treat \Lambda and C as real vectors of twice the length.