The computation of the least-squares solution of linear equations using `x = A\b`

is occasionally substantially slower than the attainable speed using LAPACK based wrappers. For example, let `A`

be a full matrix `A = rand(1000,700)`

and `b = rand(1000)`

and let `At`

be an upper triangular matrix `At = triu(A)`

and `A10`

a Hessenberg-like matrix `A10 = triu(A,-10)`

with 10 subdiagonals. The run-times on my machine for A\b, At\b and A10\b are of the same order of magnitudes as can be seen below:

```
n = 1000; m = 700;
A = rand(n,m); A10 = triu(a,-10); At = triu(a); b = rand(n);
@time A\b;
@time At\b;
@time A10\b;
```

The timing results are:

```
julia> @time A\b;
0.134719 seconds (5.65 k allocations: 13.607 MiB)
julia> @time At\b;
0.123878 seconds (5.56 k allocations: 13.576 MiB)
julia> @time A10\b;
0.121700 seconds (5.65 k allocations: 13.762 MiB)
```

As can be observed, there are practically no differences in the timing figures for the three matrices.

However, by using an experimental implementation (`qrsolve!`

, see below) of the QR-decomposition based solution approach relying on wrappers to the LAPACK families `_LARFG.F`

and ` _LARF.F`

implemeting the computation of Householder transformations and their applications, the results are much better for the matrices `At`

and `A10`

as can be seen from the following figures:

```
@time qrsolve!(A,b);
@time qrsolve!(At,b);
@time qrsolve!(A10,b);
```

for which the timing results are:

```
julia> @time qrsolve!(A,b);
0.118777 seconds (4.91 k allocations: 9.708 MiB)
julia> @time qrsolve!(At,b);
0.003154 seconds (4.91 k allocations: 9.708 MiB)
julia> @time qrsolve!(A10,b);
0.011451 seconds (4.91 k allocations: 9.708 MiB)
```

The reason for this pleasant behaviour lies in the implementation of `_LARF.F`

, which automatically discovers trailing zeros in the successive columns of `A`

and applies the transformations using only the rest of nonzero entries. This strategy is not implemented in the function `reflectorApply!`

(which is used for example in `qrfactUnblocked!`

), so even cases when no computations are needed (e.g., `A`

upper triangular), the transformations (all identity matrices) are still fully applied.

The wrappers to the LAPACK families `_LARFG.F`

and ` _LARF.F`

are available in the MatrixPencils.jl collection.

Here is the experimental function `qrsolve!`

, implemeted for the particular case of tall matrices.

```
function qrsolve!(A::AbstractMatrix{T},b::AbstractVector{T}) where {T}
require_one_based_indexing(A)
m, n = size(A)
m < n && error("Column dimension exceeds row dimension")
for k = 1:n
x = view(A, k:m, k)
τk, β = larfg!(x)
larf!('L', x, τk, view(A, k:m, k + 1:n))
larf!('L', x, τk, view(b, k:m, 1:1))
A[k,k] = β
end
return UpperTriangular(triu(A[1:n,:]))\b[1:n]
end
```