Speed up matrix multiplication with permuted vector

I checked just compute x[QR.prow] is also slow.

Thanks for your suggestion. The suggested method works for full-rank case but I am developing an algorithm for rank-deficient case which requires the former method. I think A\x dispatch to qr(A)\x which computes a basic least square solution (not for rank-deficient case).

Also in some case A::Adjoint{<:Any, <:AbstractSparseMatrix} and due to memory issue I cannot materialize the adjoint (OutOfMemoryError with sparse A'*A) and has to work with orthogonal projection via A.parent.