Even if you want the QR factors too, you can take the factorization object and use \
for the least-square solution:
julia> using LinearAlgebra
julia> X = rand(100,3); y = rand(100);
julia> F = qr(X);
julia> F \ y ≈ X \ y ≈ F.R \ (Matrix(F.Q)' * y) ≈ F.R \ (F.Q' * y)[1:size(X,2)]
true
One of the tricky things about the QR factorization is that there are two variants when the m \times n matrix X is “tall” (m > n) (as is typical in least-square problems): the “thin” variant X = \hat{Q}\hat{R} in which \hat{Q} is the same “tall” m \times n size as X and \hat{R} is square, and the “full” variant X = QR in which Q is a square unitary m \times m matrix and R is “tall” m \times n. If you do F = qr(X)
then:
F.Q
represents the square unitary matrix Q from the “full” QR factorization. (This is also what you get fromQ, R = qr(X)
.) Technically, it doesn’t actually store this matrix explicitly — instead, it has a compact representation of the linear operator Qx so thatF.Q * x
and/orF.Q' * y
(=Q^* y) can be computed quickly. (Essentially, this is what the underlying numerical algorithms “natively” give you.)Matrix(F.Q)
gives you \hat{Q}, the “thin” QR factor, as an explicit matrix. Computing this is not free, however — it basically works by multiplyingF.Q
by the first n columns of the identity matrix I — so you generally avoid constructing it if you can.F.R
always gives you the square upper-triangular matrix \hat{R} from the thin QR. (This is also what you get fromQ, R = qr(X)
.)
When you find the least-square solution from QR, what you want is \hat{R}^{-1} \hat{Q}^* y using the thin QR factors. In particular, this means that if you want to compute \hat{Q}^* y you cannot simply use F.Q' * y
, and instead need to either use something like Matrix(F.Q)' * y
or, more efficiently, (F.Q' * y)[1:n]
.
It’s generally better to use F \ y
or even better X \ y
(which uses a pivoted QR factorization automatically) to solve least-square problems:
F \ y
will deal with the size of the Q matrix for you efficiently. It will also do the upper-triangular solve \hat{R}^{-1} \text{(vector)} efficiently — since this is an upper-triangular system you can solve it in O(n^2) operations using backsubstitution. (Get out of the habit of inverting matrices!inv(A) * b
is almost never going to be the best way to solve a linear system.)X \ y
is even better, because it defaults to a pivoted QR factorization — corresponding toF = qr(X, ColumnNorm())
— which deals better with badly conditionedX
(e.g. nearly parallel columns). That’s not whatqr(X)
gives you by default because it involves an extra permutation matrix compared to ordinary QR. i.e. it is a factorization XP = QR where P is a permutation. See also Efficient way of doing linear regression - #33 by stevengj- Even if you want to explicitly compute \hat{R}^{-1} z for some z, it makes my eyes water to see people write
inv(R) * z
(in Julia) orinv(R).dot(z)
(in Python). Get out of the habit of inverting matrices. At the very least, doR \ z
(in Julia) orscipy.linalg.solve(R, z)
(in Python). Better yet, take advantage of the fact thatR
is upper-triangular by doingUpperTriangular(R) \ z
(in Julia) orscipy.linalg.solve_triangular(R, z)
(in Python)