Extracting Q from qrfact allocation-free

I am trying to extract the Q matrix from a QR decomposition without allocating. The following hack doesn’t work:

julia> using LinearAlgebra

julia> A = rand(3,2);

julia> qrf = qrfact(A);

julia> qrf.Q * Matrix{Float64}(I,2,2)
3×2 Array{Float64,2}:
 -0.554696   0.791229
 -0.64075   -0.603576
 -0.530803  -0.0982461

julia> Q = zeros(3,2);

julia> A_mul_B!(Q, qrf.Q, Matrix{Float64}(I,2,2))
┌ Warning: `A_mul_B!(C::AbstractMatrix, A::AbstractVecOrMat, B::AbstractVecOrMat)` is deprecated, use `mul!(C, A, B)` instead.
│   caller = top-level scope
└ @ Core :0
ERROR: DimensionMismatch("matrix A has dimensions (3,3), matrix B has dimensions (2,2)")
 [1] _generic_matmatmul!(::Array{Float64,2}, ::Char, ::Char, ::LinearAlgebra.QRCompactWYQ{Float64,Array{Float64,2}}, ::Array{Float64,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\site\v0.7\LinearAlgebra\src\matmul.jl:518
 [2] generic_matmatmul!(::Array{Float64,2}, ::Char, ::Char, ::LinearAlgebra.QRCompactWYQ{Float64,Array{Float64,2}}, ::Array{Float64,2}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\site\v0.7\LinearAlgebra\src\matmul.jl:509
 [3] mul! at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\site\v0.7\LinearAlgebra\src\matmul.jl:156 [inlined]
 [4] A_mul_B!(::Array{Float64,2}, ::LinearAlgebra.QRCompactWYQ{Float64,Array{Float64,2}}, ::Array{Float64,2}) at .\deprecated.jl:57
 [5] top-level scope

Any idea how to do this in v0.7 and preferably v0.6?

In my opinion,

copyto!(Q, CartesianIndices(Q), qrf.Q, CartesianIndices(Q))

should do what you want and not allocate, however

julia> using BenchmarkTools; @btime copyto!($Q, $(CartesianIndices(Q)), $(qrf.Q), $(CartesianIndices(Q)))
  5.764 μs (18 allocations: 1.88 KiB)
3×2 Array{Float64,2}:
 -0.945825    0.245839
 -0.323885   -0.668782
 -0.0226788  -0.701637

Honestly, I find QRCompactWYQ (qrf.Q isa QRCompactWYQ) extremely confusing:

julia> qrf.Q
3×3 LinearAlgebra.QRCompactWYQ{Float64,Array{Float64,2}}:
 -0.945825    0.245839   0.212082
 -0.323885   -0.668782  -0.669201
 -0.0226788  -0.701637   0.712174

julia> size(qrf.Q)
(3, 3)

julia> qrf.Q * zeros(2, 2) # fine, no problem at all
3×2 Array{Float64,2}:
 0.0  0.0
 0.0  0.0
 0.0  0.0

julia> copyto!(Q, qrf.Q) # oh no, can't do that
ERROR: BoundsError: attempt to access 3×2 Array{Float64,2} at index [Base.OneTo(3), Base.OneTo(3)]
 [1] throw_boundserror(::Array{Float64,2}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at ./abstractarray.jl:467
 [2] checkbounds at ./abstractarray.jl:432 [inlined]
 [3] copyto!(::Array{Float64,2}, ::LinearAlgebra.QRCompactWYQ{Float64,Array{Float64,2}}) at ./multidimensional.jl:1247
 [4] top-level scope

Ya it’s confusing. I am actually interested in the least allocating way to extract Q. I think even qrfact!(A) allocates for Q and R so it only overwrites A with intermediate results? Anyways, since I am not interested in R and this is a hot part of my program, I will write my own QR-like function. Thanks for the help.

1 Like

Note that for small matrices you could use qr from StaticArrays: https://github.com/JuliaArrays/StaticArrays.jl/blob/d3c5662aa8a8a65ab391dbb3bc58441dac3a1461/src/qr.jl#L4-L10. If A is larger, you could use LAPACK functions directly for BLAS float types:

These allow you to pass in work buffers that can be reused, saving a bit on allocation, but you’ll still need to extract the Q matrix from the modified A.

1 Like

You could file an issue at stdlib/LinearAlgebra for a R-less QR factorization and a method to obtain Q through the Householder transformations without allocating. The code for ldiv! reveals that for solving a system of equations Julia uses view(lmul!(adjoint(A.Q), b).

1 Like

I don’t think it is possible to do an R-less QR factorization, since R pretty much comes for free. But extracting the Q matrix allocation-free should be possible. I did a proof of concept for a fully in-place QR factorization in a bit over a 100 lines, but the only problem is getting it to give the right result!


1 Like

The LAPACK routines need some workspace which is allocated by the Julia wrappers. Is this too much allocation for you?

julia> A2=randn(1000,500); tmp=fill(0.0,size(A2,2));

julia> Q2=copy(A2);

julia> @time LAPACK.geqrf!(Q2,tmp);
  0.047948 seconds (7 allocations: 125.344 KiB)

julia> @time LAPACK.orgqr!(Q2,tmp);
  0.026197 seconds (6 allocations: 125.313 KiB)

julia> F=qrfact(A2);

julia> vecnorm(A2-Q2*F.R)  # F[:R] in v0.6

IIRC the size of the workspace depends on the preferred block size for level-3 BLAS, so it makes sense to conceal it from ordinary users.

I am aiming for 0 allocations :slight_smile:

Here is a proof of concept in about 160 lines of code with the API, GitHub - mohamed82008/InplaceQR.jl: Provides a fully in-place, thin implementation of the Householder-based QR decompositon in Golub and Van Loan's book.. A few allocations slipped through so it’s time for some optimization.

As mentioned in the issue and previously hinted by @Nosferican, lmul! does the trick.