Best way to solve a highly ill-conditioned linear system


I would like for some advice to solve the following ill-conditioned linear system using Julia. My problem has the following form

\begin{pmatrix}x^+ \\ x^- \end{pmatrix}=D_1 AD_2A^{-1}D_1\begin{pmatrix}y^+ \\ y^- \end{pmatrix},

where I know y^- and x^+=Rx^-, and have to solve for x^- and y^+ (all these quantities are vectors). Now, the matrix A has a condition number of around 10, so there is no problem with it. However, D_1 and D_2 are diagonal matrices but whose elements range from 10^{-8} to 10^{10}, so they are really badly conditioned.

Upon naively solving the problem, by splitting the resulting matrix P=D_1 AD_2A^{-1}D_1 into two blocks of equations, I am getting a very wrong results, since even though I use x^+=Rx^- to relate both equations, after solving with a direct method, that condition is no longer satisfied, even though it was used to solve the system of equations.

The P matrix has a conditioned number of 10^{34}, so I guess the problem is related to just numerical effects, but I wonder what is the best way to attack this problem, since I am trying direct solvers from Julia but the results are not accurate.


There is a method Schur complement, which allows you to divide the matrix for factorization, repeatedly choosing a well-conditioned part of the original matrix as submatrix A. To do this, it is necessary to renumber the elements of the matrix so that the coefficients of the original matrix with the largest absolute values are in parts B, C, D.

The Schur complement method is very stable: with each iteration it reduces the condition number of the resulting matrix S. Therefore, at the last levels of decomposition one should expect quite working values of the condition number.

Among the disadvantages of the method: it is poorly parallelized, and it is advisable to use the AINV method, rather than LU, to invert matrices A.

1 Like

You might be able to eliminate one of the unknowns and get a smaller, albeit
no better conditioned problem. Suppose you write the matrix as

P = \left( \begin{array}{ll} P_{11} & P_{12} \\ P_{21} & P_{22} \end{array} \right)

and the equations as

\begin{array}{l} x^+ = P_{11} y^+ + P_{21} y^- \\ \\ x^- = P_{12} y^+ + P_{22} y^- \\ \end{array}

Since y^- is known and x^+ = R x^- you can express this mess as

R x^- - P_{11} y^+ = P_{21} y^-


R x^- - R P_{12} y^+ = R P_{22} y^-

Subtract the second from the first to get

(R P_{12} - P_{11}) y^+ = (P_{21} - R P_{22}) y^-

which is an equation for y^+. You can the recover x^- from

x^- = P_{12} y^+ + P_{22} y^-
1 Like

Yes, sure. I didn’t specify it but that is how I am trying to solve this system. The problem is that the system

is also very bad conditioned, so bad that if you compute x^+ from

you get a different result than doing

even though that step is used when solving the system of equations. That is why I am interested in a way to solve the system while avoiding the numerical issues.

interesting, I will take a try

I doubt it’s a matter of finding a better algorithm. There’s not really anything in your problem statement that rules out inherent sensitivity of the solution as a function of the given data, regardless of what algorithm you use. In fact, it looks like it is almost guaranteed unless there is some additional structure. The usual standard for stability in solving a linear system is if an algorithm is backward stable, so that a computed solution is the exact solution for slightly perturbed matrices. But in problems of the form you have described, small perturbations in the matrices can greatly change the solution.

Take a 2\times 2 example in which D_1 = I and

A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}, \qquad A^{-1} = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix}, \qquad D_2 = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}, \qquad R = r

with a \ll b. A little algebra gives x^- = b y^-, x^+ = rb y^-, and

y^+ = \left(1 + \frac{b}{a} (r-1)\right) y^-.

Suppose r=1 so that y^+ = y^-. If r comes with some small relative error so that you really have \hat{r} = 1+\epsilon and there are no further errors in your algorithm, then you would compute

\hat{y}^+ = \left(1 + \frac{b}{a} ((1+\epsilon)-1)\right) y^- = \left(1 + \frac{b}{a} \epsilon\right) y^- = \left(1 + \frac{b}{a} \epsilon\right) y^+.


\frac{\hat{y}^+ - y^+}{y^+} = \frac{b}{a} \epsilon

which is a large relative error in y^+ even if \epsilon is very small. If \epsilon is of the order of the unit round-off, say about 10^{-16}, and b/a is around 10^{18}, as you suggest might be the case, then

\frac{\hat{y}^+ - y^+}{y^+} = 100.

So you have no correct digits and don’t even have the correct order of magnitude. Another way to look at it is that getting full accuracy in y^+ in double precision requires \epsilon = 10^{-16} / 10^{18} = 10^{-34}. So you would need to know R accurately to about 34 digits in this example. You need to know R to more than 18 digits to have any hope of getting any correct digits in y^+ at all. You can’t even store R with that many digits in double precision. If R involves measured data with larger errors, you are almost certainly in an even worse situation than that. This is all still true, no matter what algorithm you use.

Sometimes a specific algorithm can introduce unnecessary ill-conditioning in a specific step and a better algorithm helps. But if it is your problem formulation itself that is inherently ill-conditioned, it’s usually a sign that you need to reconsider the problem formulation.


Okay, thank you for this useful insight into the problem. That simplified sketch really helped me, and I’ll see how I can rearrange the equations so the whole process is more stable.

1 Like