What is the best way to solve a ill-conditioned linear problem?

Some background first: I’m a physicist and have little knowledge about numerical analysis and linear algebra, so maybe this is a meaningless question.

I have a matrix A in double precision, which has elements ranging from 1e-23 to like 1e-3. The condition number is large, cond(A)>1e20. I want to solve Ax=b for a column vector b. Obviously inv(A) doesn’t work well, but it seems even A\b has some error.

I found A*(A\b) - b is around 1e-10, which is not bad, but can we make it near the machine precision? Or this is something impossible?

Thank you.

2 Likes

Have yo utried svd(A)\b?

1 Like

Just tried it. In some cases it is better than A\b (I think the underlying algorithm is LU decomposition), but worse in some other cases.

Thanks for the suggestion.

You might find this blog entry useful for your problem.

Edit: I see that there is a Julia package for iterative refinement: GitHub - RalphAS/IterativeRefinement.jl: Multi-precision iterative refinement for linear matrix systems . You might give this a try.

It’s used in the post @PeterSimon linked too, but to add a little emphasis: For a computed solution \hat{x} you want to look at the relative residual

\frac{\|A\hat{x} - b\|}{\|A\| \|\hat{x}\|}

where the matrix norm is the one induced by the vector norm. opnorm(A, Inf) should be fast to compute. Looking at the absolute residual \|A\hat{x} - b\| as you did in your post won’t really tell you a lot. If the relative residual is close to the unit round-off, that’s what you should expect if there are no issues with stability. In that case, you probably won’t have much to gain by looking at other linear system solvers. Of course, there are larger questions of whether the solution to such an ill conditioned problem is meaningful and whether you should consider regularization; but those things are highly problem specific.

1 Like

Thank you! I actually checked (Ax - b) ./ b. Do you think this is a good metric?

Very interesting blog! Thank you. I will try this method.

No. It’s a bad metric. I would take the normwise relative residual using the formula I gave before. That’s really the first thing you should look at. The presence of \|\hat{x}\| in the denominator is highly significant. There is a componentwise metric you can look at, but dividing the residual by b componentwise is not it. And most algorithms will not yield small componentwise backward error even when they have small normwise errors, although iterative refinement can help with that if it’s really something you need. If you are not that confident with numerical linear algebra, you are actually going into a trickier topic by trying to look at componentwise errors. I would not recommend it unless you have specific knowledge that you need it.

Thanks a lot. I followed your suggestions, and found the error is small (<1e-24). How do I interpret this?

Before giving the long answer, the short answer is that x probably had large norm, that you got a surprisingly small residual for your problem, and that you can’t really expect to do any better by simply switching algorithms for just solving the system.

In fact, assuming you used double precision, that’s small enough that it makes me think you had a really small problem and got lucky or you had some really special problem structure or there is some mistake in computing the relative residual. From the limited information I have, I don’t know if you should feel lucky or suspicious. If it were me, I always start with suspicion and would double check:

norm(b - A * x, Inf)
opnorm(A, Inf)
norm(x, Inf)

If you mixed norms and did something like use the Frobenius norm of A (i.e. norm(A) ) on a larger problem, that might give an unexpectedly small result. That’s the only mistake I can think of that would be easy to make, in part because Matlab doesn’t give the Frobenius norm for norm(A), which of course will trip up anyone coming from Matlab. If for those values

norm(b-A*x, Inf) / (opnorm(A, Inf) * norm(x, Inf))

is really about 10^{-24}, I suppose you can feel lucky, although seeing something like that unexplained bugs me. If it’s closer to 10^{-16} you can still feel very satisfied.

The reason for all this is that there is a theorem that says that there exists a matrix \Delta A with \|\Delta A\| / \|A\| equal to the relative residual for which your computed solution \hat{x} satisfies

(A + \Delta A) \hat{x} = b

exactly. If the relative residual is close to the unit roundoff (about 10^{-16} in double precision) then this means that A is close to a matrix for which your computed solution is the exact solution. Since \Delta A is comparable to the unit roundoff, in perturbing A with \Delta A you don’t introduce any errors that are expected to be any worse than were already present because whatever your exact matrix A should be, the elements of the stored A must presumably have been rounded to fit in a Float64. The matrix \Delta A is called a backward error, and having a relative backward error that is close to the unit roundoff is generally interpreted as the best you can do. (Although there are stronger componentwise versions of this that you might try for and more specialized forward error results for some problems.)

For a given backward error, if you define

\delta = \frac{\|\Delta A\|}{\|A\|} = \frac{\|b - A\hat{x}\|}{\|A\| \|\hat{x}\|}

and the condition number \kappa(A) = \|A\| \|A^{-1}\| using the same matrix norm, then the standard perturbation bound says that you should expect your computed solution to satisfy

\frac{\| x - \hat{x}\|}{\|\hat{x}\|} \leq \kappa(A) \delta

That might still be horrible relative accuracy in \hat{x}, but if \delta is small it’s inherent to your problem and more an issue with the condition number, not the algorithm. If you are unhappy with \hat{x} at that point your recourse is to consider whether you can change the problem, perhaps by regularizing it but also perhaps by taking a completely different approach to whatever larger problem you are trying to solve. Ill-conditioning is sometimes an artificial result from taking a bad choice of basis (or bad choice of something else) that results in an ill-conditioned matrix later. However anything you do to change your problem is going to be highly specific to what you are trying to do.

8 Likes

Thanks for such a detailed explanation.

I do have

ulia> norm(A*x - b, Inf)
307.84560250442587

julia> opnorm(A, Inf)
67.36463116356485

julia> norm(x, Inf)
3.231198353882431e24

where x is the computed solution.
If I understand correctly, the “backward error” is small, and which means that the result is already the best I can have.

I think my confusion is from the difference Ax - b, which can be comparable to b, even though the backward error is small.
Does this mean the linear problem Ax = b does not always have a good solution in floating numbers? In other words, we know if the matrix A is nonsingular, then the linear problem has a unique solution in \mathbb{R}^n, but maybe this is not true with floating numbers?

By the way, you mentioned that there are theorems about this issue. Could you please give some references?

Yes. You’ve got it. Your normwise backward error is small and the normwise theory suggests you should already have the accuracy the problem merits based on its condition number. Even if the method you used to solve for x could use infinite precision and no rounding, you would expect to have comparable errors in \hat{x} due purely to storing the matrix and right-hand side in double precision.

On b, sometimes people do measure residuals relative to the right-hand side, but I did try to go with a simple description. If you include b in the relative residual you have

\frac{\|A\hat{x} - b\|}{\|A\| \|\hat{x}\| + \|b\|}

and you could place backward errors on both A and b. However since \|b\| = \|A x\| \leq \|A\| \|x\| including \|b\| in the denominator and putting errors on b doesn’t change that measure of error much. And it turns out that putting the backward errors on A is more important than having them on b. If you try to put them all on b then \|\Delta b\|/\|b\| would be much larger than \|\Delta A\|/ \|A\| in cases in which \|b\| \ll \|A\| \|\hat{x}\|. After multiplying by the condition number, this would make the errors on \hat{x} look much larger than they are. That’s why I was directing you away from worrying about errors relative to b.

I’ve been using cautious phrases like the “normwise theory suggests.” The reason is that there are specific cases in which you can do better if you have specific structure in A or in the backward error \Delta A. One thing that might be worth looking at is whether things look better if you try to make \Delta A small relative to A in a componentwise sense. If you measure the backward errors that way, there are different (and sometimes smaller) condition numbers that apply. One step of fixed precision iterative refinement typically gives a solution corresponding to a small componentwise backward error (with some technical conditions on the conditioning of A and the stability of the algorithm used). You could do

F = lu(A)
x = F\b
r = b - A*x
d = F\r
x = x + d

and the updated x will (probably) satisfy (A+\Delta A) x = b with |\Delta A| \leq k u |A|, where the absolute values and inequality are taken componentwise, k is a modest constant, and u is the unit roundoff (around 10^{-16}). You can then look at the Skeel condition number which applies when you have small componentwise errors:

condskeel(A,x,Inf)

If that is much smaller than the condition number you got before, you have (if iterative refinement worked) proportionally improved the accuracy of your solution. You can check whether iterative refinement really gave you a small componentwise backward error by recomputing r = b - Ax for the updated x and then computing

\omega(x) = \max_i \frac{|r_i|}{(|A|\cdot |x| + |b|)_i}

which is the componentwise version of the normwise relative residual above. The errors in x will then be roughly bounded by the product of \omega(x) and the Skeel condition number. It could potentially be a lot better than what you get with normwise bounds and without iterative refinement. This is all in Accuracy and Stability of Numerical Algorithms, 2nd edition, by N. Higham. I’ve mostly followed that for notation.

If that doesn’t give you the accuracy you want, your ill-conditioning is tougher to deal with. You probably want to reconsider how you got A and/or whether regularization is relevant. In fact, I think most numerical analysts would argue you want to do that first. I’ve thrown a lot of the general theory your way without knowing much about your problem. The standard advice would be that ill-conditioning is often a sign that you have formulated or discretized your problem in some bad way that you should reconsider. And of course if your problem is ill-posed, you probably want to look at regularization. Focusing too much on squeezing out extra accuracy without making bigger changes can do more harm than good.

2 Likes

Does this matrix come from data, or from a model? If it’s from a model, you probably want to think whether the matrix is actually singular (and the only reason the condition number is finite is due to roundoff errors).

A singular matrix indicates that the model is simply underspecified and has non-unique solutions. e.g. Poisson’s equation \nabla^2 u = f is singular for Neumann boundary conditions and (a) has non-unique solutions because you can add any constant to u and (b) is only solvable for the right-hand sides f with zero mean.

In such cases, worrying about ill-conditioning is a red herring. The problem is not roundoff error, it’s mainly just deciding what solution you want to have (and ensuring that the right-hand-side is valid). It’s about linear algebra, not numerical analysis.

If the ill-conditioning comes from data and you are doing some kind of fitting, then this is where data scientists start thinking about regularization. But the fact that your residual (your backward relative error) is so small makes me suspicious that you simply have an underspecified problem.

It’s from a discretization of a complicated function. I do not think it is really singular, but it seems very close to singularity. Maybe I can apply some limitations so that it’s not so singular.

Thanks for reminding me of this possibility.

1 Like

Thanks a lot. I need time to digest these details, but a quick update is that the one-step iterative refinement have the roughly the same condskeel.

I think this means that the current solution is already good. I would think about how to make the matrix not so ill-conditioned. As you said, this should be the first step.

Yes. At this point, you’ve done everything you could reasonably do by way of trying to solve Ax=b with the matrix you currently have. Since I like this sort of thing, I’ve probably encouraged you to do a bit more than is entirely reasonable given the obvious red flag of such a large condition number that likely signals a deeper problem with your problem formulation. It’s definitely time to think about how you got that matrix.

1 Like