Gram-Schmidt decomposition

Hi! Is there any specific module in Julia for performing linear algebra operations like Gram-Schmidt orthogonalization for a given set of vectors?

1 Like

qr in the LinearAlgebra standard library does this.

(Technically, Gram–Schmidt is just one possible algorithm for QR factorization; in practice linear-algebra libraries tend to use different algorithms instead.)

5 Likes

Hi! thanks for the reply, however, could you be a bit more elaborate?
Specifically how exactly do I obtain the orthonormalized vectors from the given set of vectors?

P.S ~ I knew nothing about QR decomposition to this point.

1 Like

Do using LinearAlgebra to load the LinearAlgebra module, then do Q = Matrix(qr(A).Q) where A is a matrix whose columns are the vectors you want to orthonormalize. This yields a matrix Q whose columns are the orthonormalized columns of A, equivalent (modulo roundoff errors) to Gram–Schmidt on the columns of A (also called the “thin” QR factor).

9 Likes

Thanks for clarifying! I’m new to Julia and didn’t knew this stuff before now

1 Like

No problem. This is not specific to Julia, but a lot of introductory linear-algebra courses don’t emphasize the connection of Gram–Schmidt to QR factorization.

Often, introductory linear algebra classes tend to emphasize simple (perhaps simplistic) algorithms for hand calculation, whereas practical numerical linear algebra emphasizes matrix factorizations. For example, LU factorization (instead of Gaussian elimination), QR factorization (instead of Gram–Schmidt), and diagonalization (instead of finding roots of characteristic polynomials).

Moreover, one often doesn’t work with the factors directly as matrices, but instead treats them as linear operators. For example, when working with QR factors (i.e. orthogonalized bases), typically one doesn’t need to look at the elements of Q explicitly, but instead one only needs to multiply by Q or Q^T. Julia’s qr(A).Q is actually an object that doesn’t literally store the elements of Q, but can be multiplied quickly by vectors and matrices. That’s why Matrix(qr(A).Q) is required if you want an explicit matrix of orthogonal vectors — in large-scale problems, you typically try to avoid this and instead work with Q implicitly as an operator.

27 Likes

here is an implementation : Gram–Schmidt Orthogonalization with Julia - Julia Community 🟣

1 Like

Note that this is “classical” Gram Schmidt (CGS), which is is numerically unstable (unless you do it twice). There is a simple variant called “modified Gram Schmidt” (MGS) that fixes this by a one-line change. Julia implementations of both CGS and MGS and an illustration of the roundoff errors are given in my MIT course notebook.

As explained above and is mentioned in the notebook, however, you are normally going to be much better off using qr(A), which computes the same result (accurately) via the Householder algorithm and is much faster than a textbook Gram–Schmidt implementation (whether classical or modified).

8 Likes

This has come up before. You might look at this and this

There is also a fast orthonormalization via the Cholesky decomposition. Though one must be a bit careful, it’s not as numerically stable as the QR method.

using LinearAlgebra
A = rand(5,3)  # column vectors of length 5
U = cholesky(A'A).U
B = A*inv(U)   # orthonormal column vectors spanning the same subspace.

alternatively

L = cholesky(A'A).L
B = (L\A')'

Note that this is not faster in the complexity sense — all of these algorithms are \Theta(mn^2) for an m \times n matrix A. But it is probably faster in actual time (i.e. a better “constant coefficient” of the mn^2) for m \gg n, similar to solving least-squares problems by the normal equations as discussed in Efficient way of doing linear regression - #33 by stevengj … at the expense of accuracy as you point out.

Technically, there is no such thing as “not as stable”. It’s a boolean choice: an algorithm is either numerically stable or not. It’s a question of whether the (backward) error goes to zero (at worst) linearly with the precision \varepsilon or not, i.e. whether the backward error is O(\varepsilon).

This algorithm (which squares the condition number of A) is definitely much less accurate than Householder QR or modified Gram–Schmidt. I don’t know offhand whether it is numerically unstable (i.e., its backward errors decrease more slowly than linearly), but my guess it is that it is probably unstable.

Aside: One should get out of the habit of using matrix inverses for numerical calculations. One should instinctively do A / U rather than A * inv(U). In this case, A / U exploits the fact that U is upper triangular to avoid the O(n^3) complexity of inversion.

6 Likes

I don’t think there is much written about the stability of the Cholesky approach, perhaps because it isn’t taken very seriously as a useful algorithm. There is a throw-away comment in a fairly well cited technical report suggesting that it is better than classical Gram-Schmidt without reorthogonalization, but that seems to be stated mostly to disparage Gram-Schmidt without reorthogonalization.

To summarize the usual error bounds, the backward factorization errors \| A - QR\| / \|A\| are nicely bounded as being O(u) for unit round-off u in classical Gram-Schmidt, modified Gram-Schmidt, Cholesky, and Householder and Givens QR. The difference is in the numerical orthogonality of the computed Q. For that classical Gram-Schmidt satisfies no useful bound. For modified Gram-Schmidt it is O(u) \kappa(A). And for Householder and Givens it’s just O(u) with some factors depending on matrix size. Based on a quick and dirty error analysis, I’m pretty sure that for Cholesky it is O(u)\kappa^2(A), but I’ve never seen that written down anywhere. So I think it’s somewhere between classical and modified Gram-Schmidt. And you don’t have the option for reorthogonalization that you have with CGS. As far as I know, it’s an algorithm without any compelling use.

perhaps because it isn’t taken very seriously as a useful algorithm

This isn’t quite correct. Cholesky is great where applicable (only for Matrices that you know are already posdef, and with relatively small condition numbers). It’s less stable than QR, but it will be ~3x faster.

2 Likes

Here you are applying it to A'A, so it’s always applicable (in theory), but it squares the condition number. (Unlike applying Cholesky to solve Ax=b where A is SPD.)

Fair enough. I probably should have qualified that to be more about generally applicable algorithms.

You don’t need to worry about positive definiteness in the well conditioned case. A^T A is always going to be mathematically positive definite, although Cholesky might still fail if A is ill conditioned.