mul! much slower than * for QRCompactWYQ



Suppose I have generated the following QRCompactWYQ:

Q = qr(randn(500, 10)).Q;

and I want to do in-place multiplication of Q with

x = randn(500)


y = randn(500)


@btime mul!(y, Q, x)

This runs in 2.9s in my machine, which is significantly slower than doing

@btime y = Q*x

which runs in 8.8μs.


The LAPACK routine that does this modifies the right-hand side directly. It is interfaced via lmul!(Q, x). I’m guessing it’s an oversight that the corresponding mul! method is missing.

The mul! function falls back to a generic method, which becomes extremely slow since the elements of Q take time to compute. It should be easy to write the missing method as it just needs to copy x into y and then call lmul!.


Thanks, I will create an issue/PR in Julia’s github repo then.