# Usage of QR Factors

I’d like to compute QR factorisation `Q, R = qr(A)` followed by the operation `Matrix(Q)' * Y`. As far as I can tell, `Matrix(Q)` allocates another matrix the same size as `A` which I’d like to avoid.

Is there a way to perform `Matrix(Q)' * Y` that doesn’t allocate the matrix `Matrix(Q)`? I can of course do `(Q' * Y)[1:Ncols]` where `Ncols` where `Ncols = size(A,2)`, but if `Ncols << Nrows` then I would prefer to avoid this if possible.

Hi !

What’s wrong with `q'a`? That dispatches to gemqrt which is efficient and doesn’t allocate.

If it is about allocations (not necessarily speed), why convert `Q` at all? Simply `Q' * Y` should work with less allocations.

``````julia> using LinearAlgebra

julia> A = rand(3,3);

julia> Y = rand(3,3);

julia> Q, R = qr(A);

julia> using BenchmarkTools

julia> @btime Matrix(\$Q)' * \$Y;
770.550 ns (3 allocations: 480 bytes)

julia> @btime \$(Q)'*\$Y;
625.294 ns (2 allocations: 320 bytes)
``````

Regarding speed, for `100x100` I get

``````julia> @btime Matrix(\$Q)' * \$Y;
4.179 ms (6 allocations: 234.61 KiB)

julia> @btime \$(Q)'*\$Y;
12.421 ms (4 allocations: 156.41 KiB)
``````

Yes, that is what i do - see OP (with a typo, now corrected). I probably shouldn’t have worried about it, the question was partly out of curiosity, but partly motivated by the fact that my systems are on the order 1M x 10K.

OK, I see now: q’a is N x M, not M x M as I was expecting. Strangely it seems LAPACK doesn’t even expose a routine for that…

I see - so it would be a matter of deciding the representation Q and implementing it in Julia. Probably not worth it. The runtime difference always seems to be less than O(10) anyhow.

I just documented how Q is stored in memory: https://docs.julialang.org/en/latest/stdlib/LinearAlgebra/#LinearAlgebra.QRCompactWY (see the quick-and-dirty Julia implementation to recover Q here: https://github.com/JuliaLang/julia/pull/32996) Maybe you can write it in Julia if LAPACK does not have it? It sounds like something we need if we are to expose thin rectangular Q: https://github.com/JuliaLang/julia/issues/26591

2 Likes

Thanks for this - that was useful.

I agree - this seems to be the only operation I couldn’t carry out directly with the type returned by qr.

You can also try `QR` (as opposed to `QRCompactWY`), I believe you can construct it via `LinearAlgebra.qrfactUnblocked!`. It has the benefit that that the subdiagonals of `F.factors` along with `F.τ` tell you what the Householder reflectors are, so are easy to use in a custom implementation.

This is what I use in BandedMatrices new implementation of QR, it was remarkably easy to get working.

1 Like

Thanks for the suggestion. Do you know if there Is any/much performance difference ? Im particularly worries about memory usage. My LSQ systems are at the limit of what my system can handle.

`QR` is very slightly more memory-efficient than `QRCompactWY`, because `QR` stores a vector `τ` of length `min(m,n)` while `QRCompactWY` stores a matrix `T` of size `min(m,n,36)` x `min(m,n)` (36: default block size). But both structs have a matrix `factors` of size `m` x `n` so I’d say it is pretty negligible. IIUC the benefit of `QRCompactWY` is performance since you can use matrix-matrix multiplication as a building block of the algorithm rather than matrix-vector multiplication.

1 Like

Note that `LinearAlgebra.qrfactUnblocked!` calls slow, generic Julia code, so you’ll need to call LAPACK directly:

``````fastQR!(A) = QR(LAPACK.geqrf!(A)...);
``````

As @tkf said supposedly `QRCompactWY` takes advantage of BLAS Level 3 more, but in a quick test on a 10_000 x 1000 matrix I didn’t see any performance difference between `QR` and `QRCompactWY`.

Maybe the default block size (currently 36) for `QRCompactWY` is too small? Increasing it to something like 100 did help a lot. In Julia `master` it now has `blocksize` argument so you can increase it: https://github.com/JuliaLang/julia/pull/33053. Of course, you can call `LAPACK.geqrt!` directly too. I wonder if we should increase the default.