# Constructing Q matrix from QR decomposition

Hi!
I am working with some large complex float matrices ( that may go up to 16000x16000). I need to do a QR decomposition of a rectangular matrix. I require the Q matrix to be square. It seems like that extracting the Q matrix is almost as expensive as doing the QR decomposition itself. Here is a MWE.

using LinearAlgebra
data=[]
for size=2:6
test_matrix=rand(ComplexF64,4^size,4^(size-1))
start_time=time()
Q,R=qr(test_matrix)
push!(data,time()-start_time)
println("Size ",4^size," Time taken for QR ",time()-start_time," Norm ",norm(Q*R-test_matrix))
start_time=time()
Q=Matrix(Q)
println("Time taken for Complex conversion ",time()-start_time)
flush(stdout)
end


The output (on an i5-11 th gen laptop with 4 threads utilised) is below,

Size 16 Time taken for QR 0.00011110305786132812 Norm 2.052986419474234e-15 Time taken for Complex conversion 1.1920928955078125e-5

Size 64 Time taken for QR 0.0002079010009765625 Norm 1.0968218527953343e-14 Time taken for Complex conversion 0.0003440380096435547

Size 256 Time taken for QR 0.001828908920288086 Norm 8.146303347292701e-14 Time taken for Complex conversion 0.0008859634399414062

Size 1024 Time taken for QR 0.026747941970825195 Norm 3.031789555168408e-13 Time taken for Complex conversion 0.054368019104003906

Size 4096 Time taken for QR 0.42181897163391113 Norm 1.3781341883372822e-12 Time taken for Complex conversion 0.7480239868164062


This is for a subsequent run to take into account the compilation time. But the Q obtained here is not square. In my actual calculation I am using

 Q=ComplexF64.(Q)


Q=Matrix(Q)


The former gives a square matrix but far slower than the later. (As pointed out here & here.

Subsequently,I need to do a matrix multiplication with Q obtained here as below.

M=Q1*S*Q2


where S is a diagonal matrix (with proper dimension) & Q2 is another matrix obtained from a seperate QR decomposition. Is there any way out such that I can obtain the square Q matrix in a cheap way? I donâ€™t need just matrix-vector computation but matrix-matrix multiplication. So, I canâ€™t object the compact Q object directly.

Any help will be much appreciated. Thank you.

Matrix(QR.Q) gives you the â€śthinâ€ť QRâ€™s Q, such that the number of columns equals the original number of columns, as explained in the documentation. If you want the â€śfullâ€ť QRâ€™s square Q, as explained in the docs you should do:

To retrieve the â€śfullâ€ť Q factor, an mĂ—m orthogonal matrix, use F.Q*I or collect(F.Q)

Then why do you need to form the Q matrix explicitly? The whole point of the implicit representation of Q returned by the qr algorithm (as a composition of Householder reflections) is that you can multiply by Q (or Q') quickly, without the expensive step of computing and storing the entries Q matrix.

3 Likes

Thank you so much for the response.
Yeah, I finally need to do a matrix multiplication. But, I find it too slow when doing that with Q directly instead of first constructing a matrix from the CompactWQ object. An MWE is given below,

size=5
test_matrix=rand(ComplexF64,4^size,4^(size-2))
start_time=time()
Q,R=qr(test_matrix)
randvec=rand(4^size)
A=Matrix(Q*I)
println(transpose(A)*Diagonal(randvec)*A) # This is faster

println(transpose(Q)*randvec*Q) # This takes minutes


Also, should I just use qr! instead of qr to reduce allocation?

This doesnâ€™t run because transpose isnâ€™t implemented for the implicit representation of Q:

ERROR: transpose not implemented for LinearAlgebra.QRCompactWYQ{ComplexF64, Matrix{ComplexF64}, Matrix{ComplexF64}}. Consider using adjoint instead of transpose.


Presumably, you mean the conjugate-transpose adjoint(Q), or equivalently Q', instead, since that is a more sensible operation on Q in the complex case.

Once I use adjoint, and use BenchmarkTools.jl for proper benchmarking, and donâ€™t time println (which takes a long time for a huge matrix!) then the timings look fine.

This code:

using LinearAlgebra, BenchmarkTools

size=3
test_matrix = rand(ComplexF64, 4^size,4^(size-2))
randvec = rand(4^size);
Q, R = @btime qr($test_matrix); A = @btime Matrix($Q*I);
@btime $A' * Diagonal(randvec) *$A;
@btime $Q' * Diagonal(randvec) *$Q;
@btime $Q' * diagm(randvec) *$Q;


gives:

julia> Q, R = @btime qr($test_matrix); 4.574 ÎĽs (3 allocations: 4.78 KiB) julia> A = @btime Matrix($Q*I);
17.931 ÎĽs (5 allocations: 132.22 KiB)

julia> @btime $A' * Diagonal(randvec) *$A;
116.552 ÎĽs (6 allocations: 128.12 KiB)

julia> @btime $Q' * Diagonal(randvec) *$Q;
33.273 ÎĽs (9 allocations: 136.42 KiB)

julia> @btime $Q' * diagm(randvec) *$Q;
30.414 ÎĽs (10 allocations: 168.45 KiB)


That is, the slowest by far is forming the dense matrix and multiplying by it.

This is not surprising because the time to multiply an m \times m matrix by a dense m \times m matrix Q is O(m^3), whereas the time to multiply an m \times m matrix by the implicit representation of Q in terms of Householder reflectors (where n is the number of columns in the original matrix that you QR factored) is O(m^2 n). Here, m = 16n, so itâ€™s not surprising that this saves an order of magnitude.

6 Likes