Help with numerically unstable algorithm

enweg · October 12, 2023, 8:19pm

Hi,

I have this small algorithm that is supposed to create a matrix based on random eigenvalues and eigenvectors. Since the last few rows of the matrix need to satisfy some criteria, I end up with essentially the following algorithm. It turns out though that the algorithm becomes more and more unstable as p increases. Using BigFloat improves this somewhat but the algorithm still becomes numerically unstable. There might be a mathematical reason that I am completely overlooking right now that would explain the numerical instability. In that case please let me know. If there is no mathematical reason for the numerical instability, is there anything I could do to improve the stability?

k = 3
p = 3
lambda = Diagonal(round.(rand(k*p); digits = 2))
lambda_inv = inv(lambda)
Qblocks = Vector{Matrix}(undef, p)
Qblocks[1] = rand(k, k*p)
for i = 2:lastindex(Qblocks)
    Qblocks[i] = Qblocks[i-1] * lambda_inv
end
Q = reduce(vcat, Qblocks);
Q = mapslices(x -> x ./ norm(x), Q; dims = 1)
C = Q * lambda * inv(Q);
@show maximum(abs.(Q * inv(Q) - I))
@show maximum(abs.(C[(k+1):end, 1:(k*(p-1))] - I));

nsajko · October 12, 2023, 8:48pm

I’m not an expert, but it’s well-known that the matrix inverse is quite unstable, except for some special types of matrices, so I’m not sure what is this supposed to show? Is Q some of those special types of matrices? If so, perhaps you should convert it to a more specific type than Matrix, so as to use the algorithm relevant for that matrix type.

EDIT: if Q is orthogonal, doing a QR decomposition on Q yields Q as the Q-part of the decomposition. Afterwards I guess you could use the results of the decomposition for more numerical stability.

EDIT2: Q isn’t orthogonal. And if it were orthogonal, you could use the fact that the inverse equals the transpose.

mstewart · October 12, 2023, 10:35pm

I set p=10 in that case \kappa_2(Q) was about 8.6\times 10^9. The formula C=Q * lambda / Q is preferable to what you did, but even so C is going to have errors proportional to \kappa_2(Q) \| \Lambda\|_2 u where u is the unit roundoff and \Lambda is the diagonal matrix of eigenvalues. The way you construct Q is also pretty much guaranteed to produce an ill-conditioned matrix for the same reasons that Vandermonde matrices and Krylov sequences are ill-conditioned except for a very special choice of eigenvalues. So the errors are about as large as I would expect under the circumstances just based on the inherent nature of what you are trying to do.

If you are free to choose special eigenvalues, you can generate (with good probability) a better conditioned Q by making eigenvalues equal to roots of unity:

lambda = Diagonal(exp.(im*2*pi*(0:k*p-1)/(k*p)))

That dropped the errors back down when I ran it. But I’m not sure if carefully choosing the eigenvalues like that is playing by the rules you need to follow. Probably not, since you mention random eigenvalues. However setting up the computation in this way is going to cause troubles.

If you approach it from the other direction and give C the correct structure and fill in the rest of its elements with random numbers and then solve that eigenvalue problem, there’s a possibility of getting a reasonably well conditioned eigenvector matrix and accurate eigenvalues that have some degree of randomness, although you won’t be able to control the distribution. But I’m also not sure if that’s playing by the rules. It’s a bit hard to make suggestions without knowing fully what you need.

stevengj · October 12, 2023, 10:37pm

By construction, this produces a very badly conditioned Q matrix (and hence your C matrix is nearly defective), and so you are extremely susceptible to roundoff errors. Qblocks[i] is increasing exponentially with i and hence Q is dominated by the last few rows, so its columns are nearly linearly dependent.

You probably need to think more carefully about what sort of random matrix you want.

enweg · October 18, 2023, 1:31pm

Hi all,

thanks for the replies. I will look into Vandermonde matrices and Krylov sequences to see if I can learn something from there.

A bit more on the background: I am trying to construct a companion matrix given randomly drawn eigenvalues. The reason for this is that the way we would usually create random companion matrices often leads to oscillatory behaviour in the Impulse Response Functions. To achieve non-oscillatory behaviour, I would need all eigenvalues to be real. So I thought the easiest way would be to start from random real eigenvalues and reconstruct the companion matrix, but it turns out that this is not that easy. If you have another way to draw a random companion matrix with real eigenvalues, please let me know.

stevengj · October 18, 2023, 2:22pm

What sort of “companion matrix” do you want? That term usually implies a matrix whose eigenvalues match the roots of a given polynomial, but in this case you aren’t supplying a polynomial so I’m not sure what you mean.

If you just want an n \times n matrix A with given n eigenvalues λ (e.g. λ = randn(n)), why not simply do X = randn(n, n) followed by A = X * Diagonal(λ) / X? (Or, if you want a Hermitian A, you could get a random orthogonal matrix Q by doing QR = qr(X); Q = QR.Q * Diagonal(sign.(diag(QR.R))) and then do A = Hermitian(Q * Diagonal(λ) * Q').)

(If you then want a polynomial with those roots, given the matrix, you can compute the characteristic polynomial. Numerically, working with this polynomial directly is not usually a good idea, however.)

mstewart · October 18, 2023, 11:29pm

A precise mathematical statement of the problem would help. Reading between the lines, it sounds like maybe you are doing things with a state space system and perhaps want a random pair A and B in controllable canonical form with some prescribed randomly generated real eigenvalues? But I’m really guessing on that.

enweg · October 19, 2023, 7:38am

Thanks and my apologies for not directly including the specific details. I try to explain the problem a bit more detailed below.

The precise statement is something along the following lines:

Suppose a Vector Autoregressive (VAR) model of order p,

y_t = A_1y_{t-1} + \dots + A_py_{t-p} + \varepsilon_t.

By introducing x_t' = (y_t', y_{t-1}', \dots, y_{t-p+1}'), \epsilon_t' = (\varepsilon_t', 0, ..., 0) and

C = \begin{bmatrix} A_1 & A_2 & \dots & A_{p-1} & A_p \\ I & O & \dots & O & O \\ O & I & \dots & O & O \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ O & O & \dots & I & O \end{bmatrix},

we can write the VAR(p) as x_t = Cx_{t-1} + \epsilon_t. This VAR(p) is stable iff the largest absolute eigenvalue is less than one or, equivalently, if all roots of the characteristic polynomial are outside the unit circle. It will have oscillatory behaviour whenever there is a complex conjugate pair of eigenvalues.

Ideally I would like to simulate the coefficient matrices A_1 \dots A_p such that the associated companion matrix above has only real eigenvalues that are all less than one in absolute terms. The algorithm above should create such a matrix but unfortunately is very unstable. The proposals by @stevengj sound interesting but I don’t think they will work since C is not generally orthogonal and unfortunately has this special structure (sorry for not specifying the problem at the beginning).

mstewart · October 19, 2023, 10:24am

This is likely to be an inherently ill-conditioned problem, especially if p is large relative to k. (Assuming the matrices A_j are k\times k). The eigenvalues of C can be sensitive to small errors in the matrices A_j, so even if you somehow computed the exact A_j and merely rounded them, you might have changed the eigenvalues dramatically, possibly even introducing conjugate pairs. I don’t know of an analysis of that exact problem as stated for this sort of companion matrix in general, but in the case of k=1, this is equivalent to finding a set of polynomial coefficients for a polynomial with prescribed zeros. J. H. Wilkinson famously (famous among numerical analysts, anyway) wrote about the sensitivity of polynomial zeros as a function of polynomial coefficients in The Perfidious Polynomial and in earlier works. This is also related to ill-conditioning in the pole placement problem for multi-input systems in control engineering. A sensitivity analysis of that was given in An analysis of the pole placement problem II. The multi-input case by V. Mehrmann and H. Xu. If you really need to stick to this exact formulation, you should probably lower your expectations for what is possible numerically for small k and large p. I don’t think you are going to find a minor change that is going to make what you want to do numerically reliable.

In control engineering, using state space models that don’t require this sort of companion form can help, although getting prescribed eigenvalues can still be problematic as shown in the paper by Mehrmann and Xu. I don’t have a background in statistics, but I think statisticians working with time series also try to work around these issues using state space models. Doing a little googling, there does appear to be a Julia package for state space time series models: StateSpaceModels.jl. The authors have a paper that cites a book: J. Durbin and S. J. Koopman, Time Series Analysis by State Space Methods. That sounds like something that might be helpful, although it’s outside my area and I haven’t seen the book.

mstewart · October 19, 2023, 3:35pm

I looked around in ControlSystems.jl which turns out to have a pole placement routine for multi-input systems. If you set

C = \begin{bmatrix} O & O & \dots & O & O \\ I & O & \dots & O & O \\ O & I & \dots & O & O \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ O & O & \dots & I & O \end{bmatrix},

and form a kp\times k matrix

B =\begin{bmatrix} I & O & \cdots & O \end{bmatrix}^T

Then the pole placement problem is to find a k\times kp matrix F such that C-BF has a prescribed set of eigenvalues. This matrix will be in the form you want. This can be done using

using ControlSystems
B = Matrix{Float64}(I, k*p, k)
C = diagm(-k => ones(k*p-k))
lambda = round.(rand(k*p); digits = 2)
F = place(A, B, lambda)
C = C - B*F

All the warnings about sensitivity of eigenvalues still apply. I did get a couple of conjugate pairs running this with k=3 and p=10. But I really don’t think you are going to do much better than this. It’s a respectable algorithm by experts in the field that attempts to solve this as robustly as possible. The eigenvalues are much closer to the prescribed values than when using your method. So it’s at least a substantial improvement.

Edit: I didn’t look too closely at the randomly generated eigenvalues before. The rounding causes some of the eigenvalues to be duplicated. Doing this with duplicated eigenvalues is just asking for numerical errors to split them into a conjugate pair. I would avoid that if possible.

baggepinnen · October 19, 2023, 5:48pm

You might be able to use place with higher-precision numerics?

mstewart · October 19, 2023, 7:26pm

You can, but it doesn’t help much on this example if you go back to double precision. The following code

using GenericLinearAlgebra
using ControlSystems
using LinearAlgebra

k = 3
p = 12

B = Matrix{Float64}(I, k*p, k)
C0 = diagm(-k => ones(k*p-k))
lambda = rand(k*p)
F = place(C0, B, lambda)
C = C0 - B*F
@show norm(sort(eigvals(C))- sort(lambda), Inf)

B_bf = BigFloat.(B)
C0_bf = BigFloat.(C0)
lambda_bf = BigFloat.(lambda)
F_bf = place(C0_bf, B_bf, lambda_bf)
C_bf = C0_bf - B_bf*F_bf
@show norm(sort(real.(eigvals(C_bf))) - sort(lambda_bf), Inf)

C_rounded = Float64.(C_bf)
@show norm(sort(real.(eigvals(C_rounded))) - sort(lambda), Inf)

produces output

norm(sort(eigvals(C)) - sort(lambda), Inf) = 1.8802887358226883e-8
norm(sort(real.(eigvals(C_bf))) - sort(lambda_bf), Inf) = 1.72837625884639080991231973613882468148565957750347046815941655995541472499086e-69
norm(sort(real.(eigvals(C_rounded))) - sort(lambda), Inf) = 7.225060660864813e-9

I think the problem is not so much with the computation of F but with the ill-conditioning of the eigenvalues of C-BF. The paper by Mehrmann and Xu even mentioned that there are examples where F is known exactly, but you get substantial errors in the eigenvalues just by forming C-BF (and adding in whatever backward errors are associated with solving the eigenvalue problem.) Here even just rounding the very accurate BigFloat version of C erases most of the benefit.

enweg · October 20, 2023, 11:07am

Thank you. I tried this and it indeed outperforms any implementation that I came up with so far. Also seems like my math skills are definitely not good enough to make any further progress on this so I guess I’ll stick with your solution for now and focus on small models.

mstewart · October 20, 2023, 12:12pm

Coming up with something better would be a tough standard to which to hold your math skills. People have been studying this problem in control engineering since the 1960s. The similarity transformation you used in your example is similar to things people were doing before there was significant progress on improving the numerics in the 80s. Any improvement now would be a significant research result on a hard problem that people have worked on for decades. A big and generally applicable improvement would likely contradict what is known about the sensitivity of the problem.

But with that said, you might see some benefit from switching to a more general state space model. It could be worth looking into.