Usage of QR Factors

cortner · August 23, 2019, 5:37am

I’d like to compute QR factorisation Q, R = qr(A) followed by the operation Matrix(Q)' * Y. As far as I can tell, Matrix(Q) allocates another matrix the same size as A which I’d like to avoid.

Is there a way to perform Matrix(Q)' * Y that doesn’t allocate the matrix Matrix(Q)? I can of course do (Q' * Y)[1:Ncols] where Ncols where Ncols = size(A,2), but if Ncols << Nrows then I would prefer to avoid this if possible.

antoine-levitt · August 23, 2019, 6:55am

Hi !

What’s wrong with q'a? That dispatches to gemqrt which is efficient and doesn’t allocate.

carstenbauer · August 23, 2019, 6:55am

If it is about allocations (not necessarily speed), why convert Q at all? Simply Q' * Y should work with less allocations.

julia> using LinearAlgebra

julia> A = rand(3,3);

julia> Y = rand(3,3);

julia> Q, R = qr(A);

julia> using BenchmarkTools

julia> @btime Matrix($Q)' * $Y;
  770.550 ns (3 allocations: 480 bytes)

julia> @btime $(Q)'*$Y;
  625.294 ns (2 allocations: 320 bytes)

Regarding speed, for 100x100 I get

julia> @btime Matrix($Q)' * $Y;
  4.179 ms (6 allocations: 234.61 KiB)

julia> @btime $(Q)'*$Y;
  12.421 ms (4 allocations: 156.41 KiB)

cortner · August 23, 2019, 7:03am

Yes, that is what i do - see OP (with a typo, now corrected). I probably shouldn’t have worried about it, the question was partly out of curiosity, but partly motivated by the fact that my systems are on the order 1M x 10K.

antoine-levitt · August 23, 2019, 7:23am

OK, I see now: q’a is N x M, not M x M as I was expecting. Strangely it seems LAPACK doesn’t even expose a routine for that…

cortner · August 23, 2019, 7:32am

I see - so it would be a matter of deciding the representation Q and implementing it in Julia. Probably not worth it. The runtime difference always seems to be less than O(10) anyhow.

tkf · August 23, 2019, 7:37am

I just documented how Q is stored in memory: https://docs.julialang.org/en/latest/stdlib/LinearAlgebra/#LinearAlgebra.QRCompactWY (see the quick-and-dirty Julia implementation to recover Q here: https://github.com/JuliaLang/julia/pull/32996) Maybe you can write it in Julia if LAPACK does not have it? It sounds like something we need if we are to expose thin rectangular Q: https://github.com/JuliaLang/julia/issues/26591

cortner · August 23, 2019, 8:42pm

Thanks for this - that was useful.

I agree - this seems to be the only operation I couldn’t carry out directly with the type returned by qr.

dlfivefifty · August 23, 2019, 10:31pm

You can also try QR (as opposed to QRCompactWY), I believe you can construct it via LinearAlgebra.qrfactUnblocked!. It has the benefit that that the subdiagonals of F.factors along with F.τ tell you what the Householder reflectors are, so are easy to use in a custom implementation.

This is what I use in BandedMatrices new implementation of QR, it was remarkably easy to get working.

cortner · August 23, 2019, 10:46pm

Thanks for the suggestion. Do you know if there Is any/much performance difference ? Im particularly worries about memory usage. My LSQ systems are at the limit of what my system can handle.

tkf · August 23, 2019, 11:00pm

QR is very slightly more memory-efficient than QRCompactWY, because QR stores a vector τ of length min(m,n) while QRCompactWY stores a matrix T of size min(m,n,36) x min(m,n) (36: default block size). But both structs have a matrix factors of size m x n so I’d say it is pretty negligible. IIUC the benefit of QRCompactWY is performance since you can use matrix-matrix multiplication as a building block of the algorithm rather than matrix-vector multiplication.

dlfivefifty · August 26, 2019, 3:36pm

Note that LinearAlgebra.qrfactUnblocked! calls slow, generic Julia code, so you’ll need to call LAPACK directly:

fastQR!(A) = QR(LAPACK.geqrf!(A)...);

As @tkf said supposedly QRCompactWY takes advantage of BLAS Level 3 more, but in a quick test on a 10_000 x 1000 matrix I didn’t see any performance difference between QR and QRCompactWY.

tkf · August 26, 2019, 5:40pm

Maybe the default block size (currently 36) for QRCompactWY is too small? Increasing it to something like 100 did help a lot. In Julia master it now has blocksize argument so you can increase it: https://github.com/JuliaLang/julia/pull/33053. Of course, you can call LAPACK.geqrt! directly too. I wonder if we should increase the default.

Topic		Replies	Views
Extracting Q from qrfact allocation-free General Usage	8	1397	April 1, 2018
Efficient QR factorization Numerics linearalgebra	3	738	May 30, 2019
Performance issue with QRCompactWYQ Performance performance , linearalgebra , qr	1	458	January 26, 2021
Economy-sized QR decomposition Performance qr	10	2208	April 6, 2021
Copy and collect slow on QRCompactWYQ Performance linearalgebra	1	583	April 14, 2019

Usage of QR Factors

Related topics