It’s well-conditioned for the eigenvalue, not for eigenvectors.
The part of this I’m struggling with is that in the problem we’re discussing, X certainly doesn’t have to be ill-conditioned. As per the original post, one correct solution to this problem is
The condition number of this matrix in the operator 2-norm, after normalizing each column, is 6.78. Of course, the matrix that’s actually returned by the failing eigensolver is singular (condition number approximately 1.27 \cdot 10^{17}), but it’s unclear to me why this explains anything and isn’t just a tautology. Yes, if I repeat an eigenvalue/eigenvector pair twice, I get a singular matrix that doesn’t diagonalize any matrix, let alone any close to A, even though each eigenvalue/eigevector pair is correct for a matrix very close to A, but that’s hardly a surprise and is true for any matrix. Clearly, a well-conditioned matrix of eigenvectors exists in this case—what makes it hard to find? (To be clear, this is a genuine question out of curiosity, not a rhetorical question trying to argue you’re wrong.)
The immediate, but likely naive, counterpoint here is that defective matrices are special, and while they exist in every finite-measure neighborhood of a matrix with a multiple eigenvalue, they form a measure-zero subset. Generic matrices are diagonalizable. EDIT: At least that’s what I thought, but maybe this is incorrect when restricting to the subspace of matrices with a given algebraic multiplicity?
Hence, my intuition has been that, in inexact arithmetic, you wouldn’t expect to be able to compute the normal form of a truly defective matrix because rounding errors will give you the eigendecomposition of a nearby diagonalizable matrix instead. However, it’s surprising to me that you would ever have the opposite problem (except perhaps in engineered examples where the input is a small perturbation away from a defective matrix and tweaked such that the rounding errors of the algorithm exactly cancel this perturbation).
For the same reason, if your matrix has a repeated eigenvalue you would expect the corresponding values returned by the algorithm to be rounding errors apart, not exactly identical, but such things are par for the course in any floating point operation and should never be a problem.
From a lawyerly point of view LAPACK is not “failing” here; the documentation for nonsymmetric eigenvector routines says they “compute some or all of the … eigenvectors” but nowhere claims completeness. The algorithms are basically forward or backward substitution, and will happily duplicate vectors for degenerate eigenvalues. Perhaps the Julia documentation needs to be made consistent.
If one wants the full subspace associated with a (practically) degenerate eigenvalue \lambda, the recommended approach is to compute a Schur decomposition, then use ordschur
to push the relevant vectors to the front. (Note that LAPACK does the Schur decomposition as part of eigen-analysis, so no duplication is needed at a low level.)
As @mstewart points out, whether that subspace is spanned by eigenvectors is a subtle and often ill-conditioned question.
I should mention that one of the LAPACK principals, Jim Demmel, literally wrote the book on the numerics of Jordan decompositions, so I don’t think the library is shirking here. The defective case is just outside the current scope of LAPACK.
It is true that X doesn’t have to be ill-conditioned, but finding a well conditioned matrix if one exists is not generally easy. And it is true that generic matrices have distinct eigenvalues and are therefore diagonalizable. And you have hit on another fact with your edit: In the set of matrices with a repeated eigenvalue, a single Jordan block with only one eigenvector is the generic case. If you are lucky and an algorithm does an unexpectedly good job preserving algebraic multiplicity by keeping repeated eigenvalues close, backward errors might still mean your problem now effectively has a single Jordan block, even if it was originally diagonalizable. That is more or less what happens here: The QR algorithm does a better job at keeping the eigenvalues close than it does at preserving the Jordan block structure. That’s a little unusual based on considerations of what the most generic case is, but there are matrices for which the QR algorithm gives greater accuracy in eigenvalues than you might expect (e.g. certain types of graded matrices). So it can happen. And surprising things do fairly often happen for small matrices with integer elements.
That is certainly a very likely outcome and is part of why computing a meaningful Jordan form is difficult. In the presence of errors, either in the given matrix or introduced as backward errors during computation, computing a Jordan form is more naturally reformulated as a distance problem: Is my matrix close to a matrix with a particular Jordan form. The answer is always “yes” if you are asking “is my matrix close to a diagonalizable matrix with distinct eigenvalues.” If you want to ask whether the matrix is close to a matrix with repeated eigenvalues, things get hard. Your computed eigenvalues might actually not be all that close if they are ill-conditioned. So it’s hard to set thresholds on how close they should be. If you set the threshold too tight, you miss the fact that your matrix really is close to a matrix with repeated eigenvalues. If you set it too loose then you identify a cluster of eigenvalues that is not actually all that tightly clustered. The geometric multiplicity of that cluster (and the number of Jordan blocks) is the dimension of the null space of A-\lambda I. You could compute that using the SVD, but what should \lambda be? Should it be a value in the middle of the cluster? If the unit roundoff is around 10^{-16} and the eigenvalues in the cluster vary by 10^{-8} due to ill-conditioning, it’s likely that A-\lambda I will not have a numerical null-space of the correct dimension. If you pick \lambda to be equal to one of the computed eigenvalues in the cluster, then A-\lambda I will have a numerical null space of dimension at least one, but it very likely won’t be of the same dimension you would have gotten if you somehow were able to choose \lambda to be close to an exact eigenvalue of A. And as a result of this, you end up with fewer Jordan blocks and a lower geometric multiplicity than you were hoping for. This is all to try to figure out the number of Jordan blocks. Identifying the sizes of those Jordan blocks is even harder. And computing the full Jordan form requires non-orthogonal transformations, so there are further stability issues there. The algorithms that try to do this are very expensive (I’m not really up to date on this, but at one point, I’m pretty sure they were all O(n^4) or worse) and not very reliable.
Since there is no algorithm that is guaranteed to do the right thing in terms of giving you whatever geometric multiplicities you might have wanted, LAPACK just does something simple. It considers the computed eigenvalues one by one and computes an eigenvector that gives a small residual for each one. Those might or might not end up being linearly independent and might or might not give you a well-conditioned matrix. It is perhaps interesting to consider the algorithm and how it plays out on this problem. The first step is to compute a Schur decomposition A=Q T Q^T where T is quasi-triangular (i.e. possibly with 2\times 2 blocks on the diagonal for complex conjugate pairs of eigenvalues). In this problem it is triangular. So I’ll assume that. You then try to find an eigenvector matrix X for T and the eigenvector matrix for A will be QX. Clearly the (1,1) element of T is an eigenvalue with eigenvector e_1. To get the others, we partition T as
where T_{11} is (k-1)\times(k-1) and we seek an eigenvector for t_{22}. We seek an eigenvector of the form
where y\in R^{k-1}. We then want to solve T x = t_{22} x. This is equivalent to
or y = -(T_{11} - t_{22} I)^{-1} t_{12}. This can be solved using back substitution. Note that if t_{22} is close to an eigenvalue of T_{11}, this can be ill-conditioned or perhaps even exactly singular. Nevertheless, with some safeguards for exact singularity and overflow, it does give an eigenvector for t_{22} that gives a small relative residual. But it makes no further guarantees about linear independence or conditioning of the eigenvector matrix.
So what happened with this matrix? The Schur decomposition is
julia> schur(A)
Schur{Float64, Matrix{Float64}, Vector{Float64}}
T factor:
3×3 Matrix{Float64}:
1.0 1.41421 2.44949
0.0 9.86076e-32 2.22045e-16
0.0 0.0 0.0
Z factor:
3×3 Matrix{Float64}:
0.57735 0.816497 0.0
0.57735 -0.408248 0.707107
-0.57735 0.408248 0.707107
eigenvalues:
3-element Vector{Float64}:
0.9999999999999996
9.860761315262648e-32
0.0
As @stevengj noted, the eigenvalues are well conditioned and the QR algorithm did a great job here. Much better than we might have expected in double precision. But somewhat worryingly, the (2, 3) element of T is much larger than the computed zero eigenvalues. If we consider the (2,2) element, but not the (2,3) element, to be negligible, then the block T[2:3, 2:3]
is essentially a 2\times 2 Jordan block for \lambda = 0. Due to backward errors, our matrix is now much closer to a matrix with a single Jordan block for \lambda =0 than it is to a diagonalizable matrix with a repeated zero eigenvalue. This is a very simple case in which we preserved a tight cluster of eigenvalues and you could compute the null space for A-\lambda I with \lambda =0 to ascertain the geometric multiplicity. However, that is not typical, as noted above.
How does this play out when we do the eigenvector computation? The matrix X is
3×3 Matrix{Float64}:
1.0 -1.41421 3.18453e15
0.0 1.0 -2.2518e15
0.0 0.0 1.0
The large elements are due to the severe ill-conditioning of T[1:2, 1:2] - T[3,3]*I
. Once the eigenvectors are normalized, the second and third eigenvectors are essentially linearly dependent. This is not what we wanted, but we did, in fact, get a matrix T which is much closer to a matrix with a single Jordan block for \lambda =0 than it is to a diagonalizable matrix for which \lambda =0 has geometric multiplicity 2. So I think it’s not an entirely unreasonable answer either, especially given the difficulty of guaranteeing anything better.
Thanks so much for the thorough explanation! Very enlightening.
So if you have a double eigenvalue, it’s actually quite likely that roundoff will bring you closer to a defective matrix than to a non-defective matrix without repeated eigenvalues. There’s an upper triangular 2x2 block where the diagonal values should be identical and the off-diagonal value should be zero, but in finite precision, they will vary within the roundoff error, and if the computed off-diagonal value happens to be larger than the computed difference between the diagonal values, your matrix now looks defective. Without knowing anything about the relative conditioning of the diagonal and off-diagonal elements, that looks like a 50/50 proposition.[1]
About that last point, however—if the repeated zero eigenvalue were somewhat worse conditioned, might this problem be less likely to occur? The roundoff error on the diagonal would generally be larger and more likely to dominate over the off-diagonal element. Or would the off-diagonal element typically be worse conditioned as well?
The missing ingredient in my previous reasoning was the upper triangularity of the Schur form. If you also had to consider roundoff error in the lower left element of the 2x2 block, the eigenvalue splitting subspace would have higher dimension than the defectiveness-inducing subspace. ↩︎
More accurately, for the computed eigenspace to collapse, the off-diagonal element has to be orders of magnitude larger than the difference between the diagonal values. Their ratio determines the new eigenvector’s projection onto the existing eigenvector vs. the length of the component orthogonal to it. So perhaps not quite 50/50 that the matrix will appear defective (in the sense of obtaining a rank-deficient X to machine precision), but in floating point there’s plenty of room at the bottom, so still quite likely. (Bear with my probabilistic heuristics; I realize it’s not rigorous to argue as if the rounding errors were random samples.)
I think that’s more typical. You sort of hinted at it yourself as being your intuition about what would happen in the previous post I was responding to. It’s a solid intuition most of the time. The eigenvalues are very likely to split significantly under a perturbation from the backward error and you then have a chance to get an eigenvector matrix that is at least not numerically singular. But it will likely still be a very ill conditioned matrix. Whether that is a good or bad thing depends on what the user of the code wants. Compared to this example, you are in a worse situation to say whether you have repeated eigenvalues. But maybe having eigenvalues of a nearby matrix and a not too horribly ill conditioned eigenvector matrix is good enough or even better.
This was an extreme case. Not only were the eigenvalues well conditioned, but the QR algorithm computed an amazingly good Schur decomposition in which the eigenvalues were only different by about 10^{-31}, making the eigenvector computation for the third eigenvector essentially ill-posed. For this problem, what LAPACK did didn’t get you that additional eigenvector. But the Schur decomposition was great. With that cluster of eigenvalues, you knew A was close to a matrix with two zero eigenvalues. All you have to do is look at the null space of A to get both eigenvectors. So depending on what you want, you can spin this as a major win for the QR algorithm that regrettably made the standard eigenvector computation much worse but left an opening to do something much better (at the cost of more computation).
I mentioned that I saw this happen with symmetric matrices a while back. I had given a class assignment in which students were supposed to use the Lanczos algorithm and I asked them to try it without reorthogonalization. This gave a bunch of repeated ghost eigenvalues, which I wanted them to see. Python’s eig
uses the nonsymmetric QR iteration and then does the eigenvector computation. If you simply notice that the triangular factor T is numerically diagonal, you can skip that and return orthonormal eigenvectors. But if you use the nonsymmetric eigenvector computation from LAPACK, it can produce a numerically singular eigenvector matrix even for a symmetric matrix.
Regarding the conditioning of the off-diagonal element, I don’t know of any reason why that would have to be well conditioned and I wouldn’t expect it to be in general. So, even for a cluster of 2 eigenvalues I wouldn’t expect the off diagonal to be small enough in general to mitigate the effect of having a close but somewhat split cluster of eigenvalues on the diagonal of T.
As an example where an eigenvalue is not well-conditioned, consider the matrix B = [4 2 3; 2 3 2; -3 -2 -2]
, which is exactly defective, with a Jordan form of:
In this case, the repeated eigenvalue of 1 is ill-conditioned, and eigen
gives:
julia> eigen(B)
Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
3-element Vector{Float64}:
0.9999999861848313
1.0000000138151681
3.0
vectors:
3×3 Matrix{Float64}:
0.707107 -0.707107 0.57735
9.7688e-9 9.7688e-9 0.57735
-0.707107 0.707107 -0.57735
in which the repeated eigenvalue \lambda = 1 has lost half of its significant digits.
(Not surprising, since perturbation theory for defective matrices predicts that an O(\epsilon) perturbation will yield O(\sqrt{\epsilon}) shifts in the repeated eigenvalue.)
(Of course, technically the repeated eigenvalue \lambda = 0 in the original example loses all of its significant digits, but the error was still small compared to the biggest |\lambda| or the operator norm of the matrix, whereas here that’s not the case.)
I have been hesitant to say anything on this since I’d be plugging my own work in a thread in which this is a tangent. (And also the only code I have is just proof of concept research code.) But you can let B be the semidefinite matrix ones(3,3)
and let A be the indefinite diagonal and then use the approach from the spectral transformation Lanczos algorithm from Ericsson and Ruhe 1980 to solve the generalized eigenvalue problem A x = \lambda B x. For this problem, this obviously reverses the role of A and B, but that just gives you reciprocal eigenvalues. Specifically if you get a low rank factorization B=LL^T (which might be a pivoted Cholesky factorization), you then solve the eigenvalue problem
for some shift \sigma. The eigenvalues \lambda are given by \lambda = \theta/(1+\sigma \theta) and the eigenvectors of the original problem by x = (A-\sigma B)^{-1} L u. Eigenvectors for \lambda=\infty can be obtained from the null space of B. To apply this as a direct method, forming the symmetric eigenvalue problem can be facilitated by an LDL^T factorization of A-\sigma B. This is a bit more work than the standard approach using just Cholesky factorization of B, but the cost of both methods is dominated by the symmetric eigenvalue decomposition, so it’s not that bad. This approach should work just fine for this problem.
In general, with an appropriate implementation, if you pick a shift \sigma that it is not too large and is not pathologically close to a generalized eigenvalue \lambda, then this has much better stability properties than the standard algorithm, particularly for smaller generalized eigenvalues. Finding an acceptable shift is often not hard and you don’t need to guarantee that A-\sigma B is well conditioned, just that you don’t pick \sigma to match an eigenvalue \lambda to multiple digits. I published a paper with an error analysis last year.