# Linear least-squares solution occasionally much slower than what is achievable using LAPACK based wrappers

The computation of the least-squares solution of linear equations using `x = A\b` is occasionally substantially slower than the attainable speed using LAPACK based wrappers. For example, let `A` be a full matrix `A = rand(1000,700)` and `b = rand(1000)` and let `At` be an upper triangular matrix `At = triu(A)` and `A10` a Hessenberg-like matrix `A10 = triu(A,-10)` with 10 subdiagonals. The run-times on my machine for A\b, At\b and A10\b are of the same order of magnitudes as can be seen below:

``````n = 1000; m = 700;
A = rand(n,m); A10 = triu(a,-10); At = triu(a); b = rand(n);
@time  A\b;
@time At\b;
@time A10\b;
``````

The timing results are:

``````julia> @time  A\b;
0.134719 seconds (5.65 k allocations: 13.607 MiB)
julia> @time At\b;
0.123878 seconds (5.56 k allocations: 13.576 MiB)
julia> @time A10\b;
0.121700 seconds (5.65 k allocations: 13.762 MiB)
``````

As can be observed, there are practically no differences in the timing figures for the three matrices.

However, by using an experimental implementation (`qrsolve!`, see below) of the QR-decomposition based solution approach relying on wrappers to the LAPACK families `_LARFG.F` and ` _LARF.F` implemeting the computation of Householder transformations and their applications, the results are much better for the matrices `At` and `A10` as can be seen from the following figures:

``````@time  qrsolve!(A,b);
@time  qrsolve!(At,b);
@time  qrsolve!(A10,b);

``````

for which the timing results are:

``````julia> @time  qrsolve!(A,b);
0.118777 seconds (4.91 k allocations: 9.708 MiB)
julia> @time  qrsolve!(At,b);
0.003154 seconds (4.91 k allocations: 9.708 MiB)
julia> @time  qrsolve!(A10,b);
0.011451 seconds (4.91 k allocations: 9.708 MiB)
``````

The reason for this pleasant behaviour lies in the implementation of `_LARF.F`, which automatically discovers trailing zeros in the successive columns of `A` and applies the transformations using only the rest of nonzero entries. This strategy is not implemented in the function `reflectorApply!` (which is used for example in `qrfactUnblocked!`), so even cases when no computations are needed (e.g., `A` upper triangular), the transformations (all identity matrices) are still fully applied.

The wrappers to the LAPACK families `_LARFG.F` and ` _LARF.F` are available in the MatrixPencils.jl collection.

Here is the experimental function `qrsolve!`, implemeted for the particular case of tall matrices.

``````function qrsolve!(A::AbstractMatrix{T},b::AbstractVector{T}) where {T}
require_one_based_indexing(A)
m, n = size(A)
m < n && error("Column dimension exceeds row dimension")
for k = 1:n
x = view(A, k:m, k)
τk, β = larfg!(x)
larf!('L', x, τk, view(A, k:m, k + 1:n))
larf!('L', x, τk, view(b, k:m, 1:1))
A[k,k] = β
end
return UpperTriangular(triu(A[1:n,:]))\b[1:n]
end
``````
2 Likes