For non-triangular square matrices, an LU factorization is used.

For rectangular A the result is the minimum-norm least squares solution computed by a pivoted QR factorization of A and a rank estimate of A based on the R factor.

I’m curious why LU (with pivoting) is used for square matrices while QR is used for non-square ones?

I’ve seen statements that LU works only for square matrices but my limited textbook understanding of the method says that it should work in all cases. And indeed I can call lu on a non-square matrix:

julia> A = rand(5, 3);
julia> L, U, p = lu(A);
julia> A[p, :] ≈ L * U
true

Now for some reason I cannot use lu(A) \ b on a non-square matrix:

julia> b = rand(5);
julia> lu(A) \ b
ERROR: DimensionMismatch: matrix is not square: dimensions are (5, 3)

although it could work fine, at least in this case:

julia> U \ (L \ b[p]) ≈ A\b
true

So why is lu(A) \ b not implemented, and why is QR preferred for non-square matrices (but not for square matrices)?

For least-square solutions with a non-square or singular matrix, the LU factorization is useless — it doesn’t simplify the calculation.

QR factorization, in contrast, helps you solve least-squares problems because the R factor (in the “thin” QR factorization for matrices with full column rank) is square, invertible, and triangular, and also Q^T Q = I. (There are also more technical reasons why QR factorization works well having to do with the conditioning of the linear systems, as mentioned in Efficient way of doing linear regression - #33 by stevengj)

In particular, the least-square solution minimizing \Vert Ax - b\Vert corresponds (in theory) to solving the normal equations A^T A \hat{x} = A^T b. If you plug in A=QR, it simplifies (Q^T Q cancels, and R^T cancels because it is invertible), whereas for A = LU it does not (L^T L does not cancel, and U is not invertible).

Similar things happen for minimum-norm solutions (with “wide” matrices / underdetermined problems), though in that case one sometimes uses the LQ factorization instead of QR.

Thanks! that makes perfect sense. I suppose that’s related to why Julia doesn’t bother giving a factorization in the singular case, although the PLU factorization exists?

The underlying LAPACK LU-factorization routine (which is used by Julia, Matlab, Scipy, etc.) returns an error code info > 0 when it can’t eliminate a zero on the diagonal of U, and Julia’s LinearAlgebra.lu function defaults to throwing an error in this case — you typically can’t do much with the factorization if it is singular.

However, you can pass check=false to the lu routine to see what LAPACK gives in a singular case:

(This is the default output in scipy.linalg.lu if I understand its code correctly — the singular case corresponds to info > 0 from the LAPACK *getrf function, in which case they emit a warning rather than throwing an error.)

Note that introductory linear-algebra classes often spend an inordinate amount of time with rank-deficient LU factorizations, and often go further to something called the reduced row echelon form (rref). However, this kind of approach tends to be numerically unstable in practice — if you really want to deal with rank-deficient systems, look at nullspaces, and so on, a more practical tool is typically the SVD.

See [LinearAlgebra] LU-factorization failure (0.7.0-alpha.0) · Issue #27657 · JuliaLang/julia · GitHub for further discussion about this. Unfortunately, julia conflates the notion of a failed factorization (which can happen with NoPivot()) with that of a valid factorization of a singular matrix. This conflation is made both in the behavior of the check keyword and in the display of the factorization object (which always displays as Failed factorization when info > 0, even when the factorization is valid).

Thanks for the link, I’ve made a PR to hopefully fix it…

By the way Matlab and Octave execute lu([1 2; 1 2]) without complaining. They will also give a solution to [1 2; 1 2] \ [0; 1] (with a singularity warning) rather than throwing an error like Julia does, and like I would expect after seeing this diagram from the Matlab documentation. The diagram ends on the LU factorization for a general square matrix, but there must be some further processing in the singular case…