Orthonormalize a matrix in place


For a given matrix M \in \mathbb{R}^{m \times n}, m \geq n , I would like to orthonormalize the columns of M in place.

M \leftarrow \texttt{qr}(M)\texttt{.Q}

I see that the function LinearAlgebra.qr! aims at something similar, but I cannot figure out what will be put in the matrix M (will it be R, or Q or something else ?). From experiments it feels that the following is put

  • the R factor on the upper triangular part of the matrix,
  • the reflector on the rest

It makes sens because this is how it is done with LAPACK.

Is there a way to put the economic (meaning m \times n) Q factor in M without using extra memory ? I should say that the R factor is not needed in my usecase.

I am not a numerical analyst, but I would guess that if you aren’t concerned with numerical stability or speed, you could just write your own Gram-Schmidt that updates in-place. Maybe that is literally never advisable, but at least in principle something like that would give you what you want.

The Householder QR algorithm (used by the qr function) does not compute or store the Q matrix explicitly — it uses an implicit representation of Q in terms of a sequence of elementary reflectors. (In most applications, you don’t need Q explicitly, only a way to multiply any given vector or matrix by Q or Q^*.)

You could implement modified Gram–Schmidt yourself, of course, as @cgeoga suggested, but there is some numerical loss of orthogonality that might bite you unless you use it carefully.

What are you trying to do? What do you need Q for?


If you orthogonalize with classical Gram-Schmidt twice you’ll be fine. The GMRES code in ```SIAMFANLEquations.jl```` does it that way. That code, and the orthogonalizer live in

Here is a quick and dirty qr code that uses the orthogonalizer and overwrites A with Q.
The default option, cgs2 is classical Gram-Schmidt twice, which is the best choice on any modern computer.

function qrctk!(A, orth = "cgs2")
    T = typeof(A[1, 1])
    (m, n) = size(A)
    R = zeros(T, n, n)
    @views R[1, 1] = norm(A[:, 1])
    @views A[:, 1] /= R[1, 1]
    @views for k = 2:n
        hv = vec(R[1:k, k])
        Qkm = view(A, :, 1:k-1)
        vv = vec(A[:, k])
        Orthogonalize!(Qkm, hv, vv, orth)
    return (Q = A, R = R)
1 Like

Thank you for these answers.

@ctkelley @cgeoga Having already tried to implement the Gram-Schmidt algorithm in the past, I fear I won’t be able to match the stability and the performance of the library’s QR method. Thus my question was toward using the existing function of LinearAlgebra to achieve my goal, if possible.

@stevengj Here is a usecase

I want to create an orthogonal matrix distributed on different processors, but this matrix can be very large, so it shouldn’t be generated all at once and then split the columns.

I generate the different partitions on the different processors, and then I orthogonalize the partitions against one another.

function orthonormalize(A::DArray)
  for p in workers()
    # Compute the local orthonormal partition
    @spawnat p A[:l] = Matrix(LinearAlgebra.qr(A[:l]).Q)

    # On processors p+1 .. P, orthogonalize against the basis of p
    for p2 in workers()[p:length(workers())]
      @spawnat p2 begin
        A_on_p = @fetchfrom p A[:l]
        A[:l] = (I - A_on_p * A_on_p') * A[:l]

# main
U = drand((m,m), workers(), [1,4])

See the line 4, instead I would like to perform the inplace orthonormalization of A[:l] (the local part of the input matrix).

If I use @spawnat p LinearAlgebra.qr!(A[:l]) on line 4 instead, I don’t understand how to use A[:l] afterwards to orthogonalize the other partitions, because it contains only the reflectors. I cannot do A_on_p * A_on_p' for example.

You are correct that GS is slower than the Householder reflection QR in qr! GS is not the way to compute QR in general. However, recovering Q from the output of qr! is expensive.

Classical GS twice has very good parallel performance and is stable, more stable that the variants of modified GS in the codes. GMRES in Trilinos, for example, is done that way,

I found out that projecting when using the QR compressed format ( QRCompactWY ) was never ending, rather than using the full matrix format was working

julia> using LinearAlgebra
julia> A = rand(1000,100)
julia> fact = qr!(A)
julia> (I - fact.Q * fact.Q') # Computing the projector in this way is very long (I didn't see the end)
julia> Q = Matrix(fact.Q)
julia> I - Q * Q' # In a matrix form, it takes a few ms

I would have expected using the compressed format to be faster, because it use householders transformations. I don’t see yet why it would be so long in practice.

The compressed format is designed for the circumstance where you don’t try to explicitly compute the matrix. For example, the projector P = I - fact.Q * fact.Q' should never be explicitly computed — rather, P*x should be computed implicitly as x - fact.Q * (fact.Q'*x) or similar.


@stevengj thank you for this remark, it makes sens indeed.

Using parenthesis like you pointed out solved the problem of undefinite running.

This led me to another problem, now I can compute the result of the projection with both formats, but the results do not concord

julia> using LinearAlgebra
julia> A = rand(1000,100)
julia> fact = qr!(A)
julia> B = rand(1000,100)
julia> B1 = B - fact.Q * (fact.Q' * B) # B1 uses the QRCompactWY format
julia> Q = Matrix(fact.Q)
julia> B2 = B - Q * (Q' * B) # B2 uses the explicit matrix format
julia> norm(B1 - B2, Inf)
0.8313667296907561 # The output shows both give very different results

I check the results in B1 and B2, and only B2 contains the expected result

I think you mean only B2 contains the expected result?

I forgot that fact.Q' * B by default computes the product of the “full” (square) Q^* matrix with B — so that QQ^* B = B — whereas you want the “thin” Q matrix result \hat{Q}^* B, corresponding to the first 100 rows of Q^* B:

B3 = B - fact.Q * (fact.Q' * B)[axes(B,2),:];

(I’m not sure offhand if there is a more efficient way to get this?)


Exactly I meant B2, sorry.

Ok this explain the difference, thank you !

For information I had a speedup of circa 2 to 3 with using the compact format.