Copying a QR decomposition matrix Q is extremely slow

This is on julia 1.5.3

julia> f = qr(ones(100, 20), Val(true));

julia> copy(f.Q); @time copy(f.Q);
  0.057310 seconds (40.00 k allocations: 338.516 MiB, 10.85% gc time)

julia> copy(Array(f.Q)); @time copy(Array(f.Q));
  0.000074 seconds (7 allocations: 84.969 KiB)

julia> typeof(copy(f.Q)) == typeof(copy(Array(f.Q)))
true

The copy appears to be called in svdvals which slows to a crawl for large matrices. Is this a known issue?

It appears that copy(f.Q) relies on a default method for AbstractArrays, which might explain the difference in performance. Is this something that may be improved?

Upon further introspection it appears that indexing into Q is slow.

function getindex(Q::AbstractQ, i::Integer, j::Integer)
    x = zeros(eltype(Q), size(Q, 1))
    x[i] = 1
    y = zeros(eltype(Q), size(Q, 2))
    y[j] = 1
    return dot(x, lmul!(Q, y))
end

results in

julia> @btime $f.Q[1, 1];
  22.795 μs (6 allocations: 48.84 KiB)

As a consequence

julia> @time f.Q[:,:];
 30.861779 seconds (4.00 M allocations: 46.529 GiB, 2.50% gc time)

I’m definitely not an expert here, but is that because theq isn’t present as a dense matrix until you copy it. It’s stored in packed form as a product of Householder reflectors. Do you actually a dense copy? You can multiply by it while still in packed form.

2 Likes

@malacroi is correct. Extracting Q from the data structure you get from qr or qr! is expensive. IIRC, it’s O(N^3) work. If you just want to multiply D by Q’, for example, then f.Q'*B is what you want.

For example

julia> A=rand(1000,1000); AC=copy(A); AF=qr!(A);

julia> C=AF.Q*AF.R; norm(C-AC)
4.49007e-13

julia> D=AF.Q'*AC; RC=copy(AF.R); norm(D-RC)
1.27313e-13

The copy of AF.R is cheap because you don’t have to compute anything new. Not so if you do AF.Q.

1 Like

The fast way to extract the Q factor as a dense matrix is just what you already do in your first example, namely Array(f.Q). Indeed, copy will probably fall back on a general method that uses indexing, which is slow for the way f.Q is stored/represented.

If you want to make a copy of an array or some object that behaves as it, but is maybe not explicit (like a Q factor, or a transposed matrix), and you want the result to be a explicit Array that has fast indexing and its own memory, then Array(object) is a good way.

1 Like

Looks like we’re missing a specialized copy(x::QRPackedQ) = Array(x) method for performance. But maybe that method should even return a QRPackedQ object, since copy doesn’t have to return an Array. You could file an issue in GitHub.

C=Array(AF.Q) is ok if you must have Q. It is far better than copy. The allocation burden seems a bit steep for the 8MB you need to store Q.

julia> A=rand(1000,1000); AF=qr!(A);

julia> @btime C=Array($AF.Q);
  31.310 ms (8 allocations: 22.89 MiB)

Currently svdvals relies on a copy.

svdvals(A::AbstractMatrix{<:BlasFloat}) = svdvals!(copy(A))

I realize that I may evaluate svdvals(Array(Q)) instead of svdvals(Q), however is it reasonable to expect svdvals to perform this by itself? This is a different question from whether we should implement a specific method for copy(::AbstractQ), as — from the suggestions above — copy might not be be the best choice for certain types of matrices.