Performance issue with QRCompactWYQ

Hey, since it’s not a bug specifically but a performance issue I post it here instead on Github.
In the attempt of generating random positive definite matrices I realized there was a huge performance issue with the QRCompactWYQ type, here is an example:

using BenchmarksTools, LinearAlgebra
A = Diagonal(exp.(rand(100)) # Vector of eigenvalues
B, _ = qr(rand(100, 100)) # We obtain a unitary matrix
C = Matrix(B) # For comparison
@btime Symmetric($B * $A * $(B)')
## 149.708 ms (30008 allocations: 25.86 MiB)
@btime Symmetric($C * $A * $(C)')
## 71.447 μs (6 allocations: 156.50 KiB)

Which is a 2000x speedup…

The issue here is that the factor Q of the factorization A=QR is stored in Compact WY format as documented here. So if you try to multiply the factor Q with another matrix this is not done via the fast BLAS functions, but the slow generic fallback function.

Maybe the documentation should be more clear on this issue and specifically state that Q should be converted into a regular matrix for matrix-matrix operations?

Another way would be to dispatch such operations and do this conversion behind the scenes.

1 Like