Copy and collect slow on QRCompactWYQ

The functions copy and collect are extremely slow on the results of a QR factorization. Matrix produces the same result, but is much faster. An easy solution is to not use copy or collect. Is there a good reason for these routines to be so much slower? If not, fixing this is sure to help some people.

The following example is with Julia 1.1.0, on an iMac.

julia> n = 10
julia> q = qr(randn(n,n)).Q;
julia> @btime x = copy($q);
  81.203 μs (301 allocations: 47.75 KiB)

julia> @btime y = collect($q);
  80.980 μs (301 allocations: 47.75 KiB)

julia> @btime z = Matrix($q);
  5.253 μs (2 allocations: 1.75 KiB)

julia> x == y == z
true

julia> n = 100;
julia> q = qr(randn(n,n)).Q;
julia> @btime x = copy($q);
  114.106 ms (30002 allocations: 25.71 MiB)

julia> @btime y = collect($q);
  113.197 ms (30002 allocations: 25.71 MiB)

julia> @btime z = Matrix($q);
  114.343 μs (4 allocations: 156.41 KiB)

As you can imagine, it becomes much worse with larger matrices.

@which is a good way to figure out what the different functions are calling. I suspect Matrix is overloaded specifically for QRCompactWY to call the fast lmul! implementation while the other routines still use the generic fallbacks based on getindex.

2 Likes