Please correct me if I’m wrong as I fiddle around. As far as I can see, if A is an m \times n non-square matrix and b is an m \times 1 vector, then A\b returns the minimal norm minimal least square solution. That is, consider the image hyperplane Im(A). There is a unique point b_0 on the hyperplane that is minimal distance from b (b_0 can equal b itself). Then the equation Ax=b_0 has at least one solution, and you can take the minimal norm solution x.

However, this does not apply when A is square. If A is square and non-singular, A\b returns the unique solution, great. But when A is square and singular, A\b returns an error, whether b is in Im(A) or not. This seems mathematically inconsistent to me. In my opinion, A\b should return a warning rather than an error if A is singular. The warning could read “A is singular, and in this case, Ax=b has no solutions (or infinitely many solutions) and this algorithm is returning the minimal norm LS solution, not any unique solution…”

Once rounding is involved, deciding if b is in \mbox{Im}(A) is not an entirely simple thing. Even deciding if A is rank deficient is not simple. Both require setting some sort of threshold to decide if b is close to \mbox{Im}(A) or A is close to being rank deficient. My own opinion is that those thresholds are usually problem specific and are best chosen by the user. My preference would actually be the opposite: I’d prefer that everything behave more like the square case by default with a well exposed and relatively high level way to provide a tolerance for making rank decisions if the user wants to do so. At the moment, unless I’m missing something, the access to that tolerance appears to be through the method

ldiv!(A::QRPivoted{T}, B::AbstractMatrix{T}, rcond::Real) where T<:BlasFloat

which seems very low level to me for something that can be very important in some cases. Then by default A\b would always provide a least squares solution for tall A, a minimum norm solution for wide A, and the unique solution for the square case, with an error in the case the computation really has a hard failure due to rank deficiency.

However, I’m aware that there is a counter argument that the current behavior is an attempt to silently do what the user probably wants and my way would be a foot gun for anyone who doesn’t know they have a numerically rank deficient problem. So maybe I’m just a control freak who is OK with having dangerous defaults.

A lot of the conventions for numerical linear algebra have been around for a long time and it’s usually a good thing to have implementations be consistent with these, so it’s not like Julia decided to do something here that’s unusual. Also often these algorithms call lower level algos like in this case Lapack’s dgesv.

With numpy for instance:

>>> import numpy as np
>>> a = np.array([[1, 0],[0, 0]])
>>> b = np.array([1,0])
>>> np.linalg.solve(a, b)
(...)
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix

I don’t think deciding if b is in Im(A) is too hard, unless I’m mistaken. First, b_0 can be determined efficiently via least squares whether or not b “truly” lies in Im(A) or not. And second, I suppose you could use isapprox(b0,b) if you wanted a readout whether there are solutions or not.

It sounds easy, and in a sense it is easy to do the computation, but the part that bugs me is with the “truly” part and setting the tolerance for that. It can be tough if \mbox{Im}(A) is sensitive to perturbations in A. Suppose you have a matrix A that has rank 2 with some additional errors of order 10^{-16} thrown on top of that. And suppose your vector b has similar errors. Consider the following computation

The elements 10^{-16} correspond to my error. This computation suggests that b = 1.0 u_2 - 5\times 10^{-9} u_3 where u_3 is orthogonal to the numerically computed \mbox{Im}(A). It appears to have a component orthogonal to \mbox{Im}(A) that is 7 orders of magnitude larger than the errors in A and b. So if I look at this, I’m going to conclude that b is not in \mbox{Im}(A).

However for my error-free version of A and b I get:

So this would lead me to the opposite conclusion. Should I relax my tolerance in the first computation to something more like 10^{-8} to get the “right” answer? Should a routine with a built-in tolerance make that decision for me?

You’ll get similar issues with any method that tries to compute a basis for \mbox{Im}(A) before checking if b is in that subspace. You can do better by making a rank decision on the augmented matrix \left[ \begin{array}{cc} A & b \end{array} \right] instead of on just A. But at that point you are computing a relatively expensive rank revealing factorization of both A and b. If you have a new right-hand side, you can’t reuse the factorization. You’re slowing down a simple computation and giving up one of the big advantages of using a matrix factorization. And you are still going to have to use a tolerance that might be different from what the user wanted. Going down this road in a consistent and reliable way is just not worth it, in my opinion.