Extract submatrix from QR factor is slow Julia 1.4

mleprovost · March 27, 2020, 6:12am

Hello everyone,

I have a pretty large matrix A \sim 600 \times 200 on which I would like to perform a QR decomposition.

Performing the QR decomposition takes 10ms, that seems slow to me, is there a way to make it faster.

In the end, I only need to access a submatrix of Q and the matrix R. But forming the submatrix Q is as slow as performing the qr decomposition!!! I have tried with Matrix{Float64}(F.Q) and F.Q*Matrix(I, blabla, blabla) but both are very slow.

Here is a toy version of my code:

using BenchmarkTools
N = 200
nx = 597
λ = 0.1

ψ = rand(nx,N)
ψ_aug = zeros(N+nx, nx)

ψ_aug[1:N,:] .= ψ'
ψ_aug[N+1:N+nx,:] .= Matrix(sqrt(λ)*I, nx, nx)

@btime F = qr(ψ_aug)

Q1 = zeros(N, nx)
@btime Q1 .= view(F.Q*Matrix(I, N+nx, N+nx), 1:N, 1:nx)
@btime Q1 .= (F.Q*Matrix(I,N+nx, N+nx))[1:N, 1:nx]
@btime Q1 .= Matrix{Float64}(F.Q)[1:N, 1:nx];


ψbis  = rand(4, N) 

A = ψbis - (ψbis*Q1)*Q1';

Output of the code:

10.438 ms (7 allocations: 3.96 MiB)
12.054 ms (14 allocations: 9.08 MiB)
10.338 ms (15 allocations: 9.99 MiB)
8.910 ms (13 allocations: 10.89 MiB)

I welcome any other speed improvement on this code.

carstenbauer · March 27, 2020, 8:39am

Why not just

julia> @btime Q1 = view(F.Q, 1:N, 1:nx);
  115.400 ns (4 allocations: 160 bytes)

For comparison, on my computer

julia> @btime Q1 .= Matrix{Float64}(F.Q)[1:N, 1:nx];
  34.454 ms (13 allocations: 10.89 MiB)

mleprovost · March 27, 2020, 4:20pm

Unfortunatley, this solution only moves the problem since computing the A matrix (last line of the code) is now super expensive and takes 16s (timed with @btime).

Concerning the QR decomposition:
Can I set a partial qr decomposition?
Is there a qr routine optimized for multi-threading?

stevengj · March 27, 2020, 4:58pm

The Householder QR factorization (a variant of which is used by Julia’s qr function) does not compute Q explicitly — instead, it stores a sequence of “Householder reflector” vectors that can be used to compute Q*x (or Q'*x) for any vector x almost as quickly as if you had an explicit Q matrix. See, for example, these notes.

When you call Matrix(F.Q) to compute Q explicitly, it basically has to multiply the Householder reflectors by the columns of the identity matrix I. To get a rough estimate of the cost, you can look at the flop count (floating-point additions and multiplications), and indeed the flop count for computing Q explicitly is similar to the flop count for the QR factorization itself for square matrices, and in fact is larger if the original matrix is very “tall” (non-square)!

To get a faster approach that avoids forming a Q submatrix explicitly, you should realize that Q1 = Q[1:N, 1:nx] (where Q is N+nx × N+nx) is equivalent to

Q1 = Matrix(I, N, N+nx) * Q * Matrix(I, N+nx, nx) = P1 * Q * P2

where P1 and P2 are very sparse. The trick is to exploit this sparsity, and multiply the P matrices by your other vectors rather than Q. You eventually want to compute (ψbis*Q1)*Q1'. This becomes:

(ψbis*Q1)*Q1' == ((ψbis*P1) * Q) * (P2*P2') * Q' * P1'

The key is to realize that al these multiplications by P are equivalent to extracting a submatrix or padding with zeros, but we can do this after we multiply by Q or Q'. That is, we are doing:

f(ψbis, Q, N, nx) = ([([ψbis zeros(size(ψbis,1), nx)] * Q)[axes(ψbis,1), 1:nx] zeros(size(ψbis,1), N)] * Q')[axes(ψbis,1), 1:N]

On my machine,

@btime f($ψbis, $(F.Q), $N, $nx);

takes about 850µs and produces the same result as (ψbis*Q1)*Q1', whereas QR factorization took about 12ms.

Ideally, this would be automated. For that to happen, F.Q[range1, range2] would have to return a new kind of matrix-like object that, upon multiplication, implicitly pads its input with zeros and extracts range1 rows of the result (or vice versa for left-multiply or multiplication by the transpose). This is certainly possible to implement in Julia (in the same way that F.Q itself is a matrix-like object that implicitly multiplies by the Householder reflectors). A PR for this functionality might be of interest.

PS. To get the first nx columns of Q, a faster way is to multiply F.Q * Matrix(I, size(F.Q, 1), nx), but I don’t know of similarly easy and efficient way to get the first N rows of the first nx columns. (The algorithm for the latter in principle exists, but it would have to be implemented.)

Topic		Replies	Views
QR Decomposition Extremely Slow General Usage linearalgebra	4	1379	October 30, 2018
Copying a QR decomposition matrix Q is extremely slow Performance question , linearalgebra	7	965	November 22, 2020
Constructing Q matrix from QR decomposition Performance linearalgebra , matrices	3	768	February 19, 2024
QR decomposition in julia 2x slower than octave Performance question	17	2328	April 14, 2018
Usage of QR Factors General Usage linearalgebra	12	1993	August 26, 2019

Extract submatrix from QR factor is slow Julia 1.4

Related topics