Constructing Q matrix from QR decomposition

Hi!
I am working with some large complex float matrices ( that may go up to 16000x16000). I need to do a QR decomposition of a rectangular matrix. I require the Q matrix to be square. It seems like that extracting the Q matrix is almost as expensive as doing the QR decomposition itself. Here is a MWE.

using LinearAlgebra
data=[]
for size=2:6
    test_matrix=rand(ComplexF64,4^size,4^(size-1))
    start_time=time()
    Q,R=qr(test_matrix)
    push!(data,time()-start_time)
    println("Size ",4^size," Time taken for QR ",time()-start_time," Norm ",norm(Q*R-test_matrix))
    start_time=time()
    Q=Matrix(Q)
    println("Time taken for Complex conversion ",time()-start_time)
    flush(stdout)
end

The output (on an i5-11 th gen laptop with 4 threads utilised) is below,

Size 16 Time taken for QR 0.00011110305786132812 Norm 2.052986419474234e-15 Time taken for Complex conversion 1.1920928955078125e-5

Size 64 Time taken for QR 0.0002079010009765625 Norm 1.0968218527953343e-14 Time taken for Complex conversion 0.0003440380096435547

Size 256 Time taken for QR 0.001828908920288086 Norm 8.146303347292701e-14 Time taken for Complex conversion 0.0008859634399414062

Size 1024 Time taken for QR 0.026747941970825195 Norm 3.031789555168408e-13 Time taken for Complex conversion 0.054368019104003906

Size 4096 Time taken for QR 0.42181897163391113 Norm 1.3781341883372822e-12 Time taken for Complex conversion 0.7480239868164062

This is for a subsequent run to take into account the compilation time. But the Q obtained here is not square. In my actual calculation I am using

 Q=ComplexF64.(Q)

instead of

Q=Matrix(Q)

The former gives a square matrix but far slower than the later. (As pointed out here & here.

Subsequently,I need to do a matrix multiplication with Q obtained here as below.

M=Q1*S*Q2

where S is a diagonal matrix (with proper dimension) & Q2 is another matrix obtained from a seperate QR decomposition. Is there any way out such that I can obtain the square Q matrix in a cheap way? I don’t need just matrix-vector computation but matrix-matrix multiplication. So, I can’t object the compact Q object directly.

Any help will be much appreciated. Thank you.

Matrix(QR.Q) gives you the “thin” QR’s Q, such that the number of columns equals the original number of columns, as explained in the documentation. If you want the “full” QR’s square Q, as explained in the docs you should do:

To retrieve the “full” Q factor, an m×m orthogonal matrix, use F.Q*I or collect(F.Q)

Then why do you need to form the Q matrix explicitly? The whole point of the implicit representation of Q returned by the qr algorithm (as a composition of Householder reflections) is that you can multiply by Q (or Q') quickly, without the expensive step of computing and storing the entries Q matrix.

3 Likes

Thank you so much for the response.
Yeah, I finally need to do a matrix multiplication. But, I find it too slow when doing that with Q directly instead of first constructing a matrix from the CompactWQ object. An MWE is given below,

size=5
test_matrix=rand(ComplexF64,4^size,4^(size-2))
start_time=time()
Q,R=qr(test_matrix)
randvec=rand(4^size)
A=Matrix(Q*I)
println(transpose(A)*Diagonal(randvec)*A) # This is faster

println(transpose(Q)*randvec*Q) # This takes minutes

Also, should I just use qr! instead of qr to reduce allocation?

This doesn’t run because transpose isn’t implemented for the implicit representation of Q:

ERROR: transpose not implemented for LinearAlgebra.QRCompactWYQ{ComplexF64, Matrix{ComplexF64}, Matrix{ComplexF64}}. Consider using adjoint instead of transpose.

Presumably, you mean the conjugate-transpose adjoint(Q), or equivalently Q', instead, since that is a more sensible operation on Q in the complex case.

Once I use adjoint, and use BenchmarkTools.jl for proper benchmarking, and don’t time println (which takes a long time for a huge matrix!) then the timings look fine.

This code:

using LinearAlgebra, BenchmarkTools

size=3
test_matrix = rand(ComplexF64, 4^size,4^(size-2))
randvec = rand(4^size);
Q, R = @btime qr($test_matrix);
A = @btime Matrix($Q*I);
@btime $A' * Diagonal(randvec) * $A;
@btime $Q' * Diagonal(randvec) * $Q;
@btime $Q' * diagm(randvec) * $Q;

gives:

julia> Q, R = @btime qr($test_matrix);
  4.574 μs (3 allocations: 4.78 KiB)

julia> A = @btime Matrix($Q*I);
  17.931 μs (5 allocations: 132.22 KiB)

julia> @btime $A' * Diagonal(randvec) * $A;
  116.552 μs (6 allocations: 128.12 KiB)

julia> @btime $Q' * Diagonal(randvec) * $Q;
  33.273 μs (9 allocations: 136.42 KiB)

julia> @btime $Q' * diagm(randvec) * $Q;
  30.414 μs (10 allocations: 168.45 KiB)

That is, the slowest by far is forming the dense matrix and multiplying by it.

This is not surprising because the time to multiply an m \times m matrix by a dense m \times m matrix Q is O(m^3), whereas the time to multiply an m \times m matrix by the implicit representation of Q in terms of Householder reflectors (where n is the number of columns in the original matrix that you QR factored) is O(m^2 n). Here, m = 16n, so it’s not surprising that this saves an order of magnitude.

6 Likes