Hi!

I am working with some large complex float matrices ( that may go up to 16000x16000). I need to do a QR decomposition of a rectangular matrix. I require the Q matrix to be square. It seems like that extracting the Q matrix is almost as expensive as doing the QR decomposition itself. Here is a MWE.

```
using LinearAlgebra
data=[]
for size=2:6
test_matrix=rand(ComplexF64,4^size,4^(size-1))
start_time=time()
Q,R=qr(test_matrix)
push!(data,time()-start_time)
println("Size ",4^size," Time taken for QR ",time()-start_time," Norm ",norm(Q*R-test_matrix))
start_time=time()
Q=Matrix(Q)
println("Time taken for Complex conversion ",time()-start_time)
flush(stdout)
end
```

The output (on an i5-11 th gen laptop with 4 threads utilised) is below,

```
Size 16 Time taken for QR 0.00011110305786132812 Norm 2.052986419474234e-15 Time taken for Complex conversion 1.1920928955078125e-5
Size 64 Time taken for QR 0.0002079010009765625 Norm 1.0968218527953343e-14 Time taken for Complex conversion 0.0003440380096435547
Size 256 Time taken for QR 0.001828908920288086 Norm 8.146303347292701e-14 Time taken for Complex conversion 0.0008859634399414062
Size 1024 Time taken for QR 0.026747941970825195 Norm 3.031789555168408e-13 Time taken for Complex conversion 0.054368019104003906
Size 4096 Time taken for QR 0.42181897163391113 Norm 1.3781341883372822e-12 Time taken for Complex conversion 0.7480239868164062
```

This is for a subsequent run to take into account the compilation time. But the Q obtained here is not square. In my actual calculation I am using

```
Q=ComplexF64.(Q)
```

instead of

```
Q=Matrix(Q)
```

The former gives a square matrix but far slower than the later. (As pointed out here & here.

Subsequently,I need to do a matrix multiplication with Q obtained here as below.

```
M=Q1*S*Q2
```

where S is a diagonal matrix (with proper dimension) & Q2 is another matrix obtained from a seperate QR decomposition. Is there any way out such that I can obtain the square Q matrix in a cheap way? I donâ€™t need just matrix-vector computation but matrix-matrix multiplication. So, I canâ€™t object the compact Q object directly.

Any help will be much appreciated. Thank you.

`Matrix(QR.Q)`

gives you the â€śthinâ€ť QRâ€™s Q, such that the number of columns equals the original number of columns, as explained in the documentation. If you want the â€śfullâ€ť QRâ€™s square Q, as explained in the docs you should do:

To retrieve the â€śfullâ€ť Q factor, an `m`

Ă—`m`

orthogonal matrix, use `F.Q*I`

or `collect(F.Q)`

Then why do you need to form the Q matrix explicitly? The whole point of the implicit representation of Q returned by the `qr`

algorithm (as a composition of Householder reflections) is that you can multiply by `Q`

(or `Q'`

) quickly, *without* the expensive step of computing and storing the entries Q matrix.

3 Likes

Thank you so much for the response.

Yeah, I finally need to do a matrix multiplication. But, I find it too slow when doing that with Q directly instead of first constructing a matrix from the CompactWQ object. An MWE is given below,

```
size=5
test_matrix=rand(ComplexF64,4^size,4^(size-2))
start_time=time()
Q,R=qr(test_matrix)
randvec=rand(4^size)
A=Matrix(Q*I)
println(transpose(A)*Diagonal(randvec)*A) # This is faster
println(transpose(Q)*randvec*Q) # This takes minutes
```

Also, should I just use `qr!`

instead of `qr`

to reduce allocation?

This doesnâ€™t run because `transpose`

isnâ€™t implemented for the implicit representation of Q:

```
ERROR: transpose not implemented for LinearAlgebra.QRCompactWYQ{ComplexF64, Matrix{ComplexF64}, Matrix{ComplexF64}}. Consider using adjoint instead of transpose.
```

Presumably, you mean the conjugate-transpose `adjoint(Q)`

, or equivalently `Q'`

, instead, since that is a more sensible operation on Q in the complex case.

Once I use `adjoint`

, and use BenchmarkTools.jl for proper benchmarking, and donâ€™t time `println`

(which takes a long time for a huge matrix!) then the timings look fine.

This code:

```
using LinearAlgebra, BenchmarkTools
size=3
test_matrix = rand(ComplexF64, 4^size,4^(size-2))
randvec = rand(4^size);
Q, R = @btime qr($test_matrix);
A = @btime Matrix($Q*I);
@btime $A' * Diagonal(randvec) * $A;
@btime $Q' * Diagonal(randvec) * $Q;
@btime $Q' * diagm(randvec) * $Q;
```

gives:

```
julia> Q, R = @btime qr($test_matrix);
4.574 ÎĽs (3 allocations: 4.78 KiB)
julia> A = @btime Matrix($Q*I);
17.931 ÎĽs (5 allocations: 132.22 KiB)
julia> @btime $A' * Diagonal(randvec) * $A;
116.552 ÎĽs (6 allocations: 128.12 KiB)
julia> @btime $Q' * Diagonal(randvec) * $Q;
33.273 ÎĽs (9 allocations: 136.42 KiB)
julia> @btime $Q' * diagm(randvec) * $Q;
30.414 ÎĽs (10 allocations: 168.45 KiB)
```

That is, the slowest by far is forming the dense matrix and multiplying by it.

This is not surprising because the time to multiply an m \times m matrix by a dense m \times m matrix Q is O(m^3), whereas the time to multiply an m \times m matrix by the implicit representation of Q in terms of Householder reflectors (where n is the number of columns in the original matrix that you QR factored) is O(m^2 n). Here, m = 16n, so itâ€™s not surprising that this saves an order of magnitude.

6 Likes