# Speed up matrix multiplication with permuted vector

I have a very large sparse matrix of the order `(1768, 39816100)` that goes into the following (part of Computing sparse orthogonal projections - #2 by stevengj):

``````QR = qr(A::SparseMatrixCSC) # A = sparse matrix of size (1768, 39816100)
y = QR.Q' * x[QR.prow] # slow for x::SparseVector
y[size(A,2)+1:end] .= 0
p = QR.Q * y
p[QR.prow] = p
``````

The multiplication of (dense) `QR.Q'` with permuted sparse `x[QR.prow]` is slow. Julia currently compute the vector `x[QR.prow]` and then do optimized matmul. Note that QR is a SuiteSparse factorization object returned by sparse qr decomposition and afaik is typically dense and stored in an optimized way to multiply with a vector. Any ideas on how to improve this further?

What’s `x` and `A`?

Edit: In terms of how it’s constructed, what its type is etc. Makes a difference for checking which code is called 1 Like

That’s not a documented field of the object returned by `qr(A)`, where does this come from?

1 Like

Sorry I am editing the original code as you asked Edit: I might have found the correct bottleneck: even this `y = QR.Q' * x` is slow so the problem might not be due to `getindex`. I think I am computing a dense matrix vector multiplication.

Sorry, I still have trouble constructing this `x` Does this look correct for a MWE (Please provide a copy-pastable example next time )?

``````julia> A = sprand(10,20,0.5)
10×20 SparseMatrixCSC{Float64, Int64} with 93 stored entries:
⡙⠂⢍⣘⢪⣎⠴⢁⣹⡷
⠟⣅⢆⢀⣗⣠⡡⠨⡈⢬
⠈⠚⠘⠛⠂⠉⠉⠈⠊⠚

julia> x = sprand(10, 0.5)
10-element SparseVector{Float64, Int64} with 5 stored entries:
[1 ]  =  0.766693
[2 ]  =  0.66149
[4 ]  =  0.385874
[8 ]  =  0.90622
  =  0.931918
``````

The `x[QR.prow]` allocates a bunch of new memory, which may be what slows you down here as I believe `x` is somewhat large. Sadly, using a `view` here doesn’t seem to help too much:

``````julia> @time y = QR.Q' * x[QR.prow];
0.000117 seconds (2.18 k allocations: 126.141 KiB)

julia> @time y = QR.Q' * @view x[QR.prow];
0.000104 seconds (2.17 k allocations: 125.047 KiB)
``````

I think at this point there’s some expert in terms of sparse matrices etc. needed Nevertheless, doesn’t the suggestion at the end of the post you linked do what you want? This one:

1 Like

I checked just compute `x[QR.prow]` is also slow.

Thanks for your suggestion. The suggested method works for full-rank case but I am developing an algorithm for rank-deficient case which requires the former method. I think `A\x` dispatch to `qr(A)\x` which computes a basic least square solution (not for rank-deficient case).

Also in some case `A::Adjoint{<:Any, <:AbstractSparseMatrix}` and due to memory issue I cannot materialize the adjoint (OutOfMemoryError with sparse A'*A) and has to work with orthogonal projection via `A.parent`.