Speed up matrix multiplication with permuted vector

Thomas · September 1, 2021, 10:55am

I have a very large sparse matrix of the order (1768, 39816100) that goes into the following (part of Computing sparse orthogonal projections - #2 by stevengj):

QR = qr(A::SparseMatrixCSC) # A = sparse matrix of size (1768, 39816100)
y = QR.Q' * x[QR.prow] # slow for x::SparseVector
y[size(A,2)+1:end] .= 0
p = QR.Q * y
p[QR.prow] = p

The multiplication of (dense) QR.Q' with permuted sparse x[QR.prow] is slow. Julia currently compute the vector x[QR.prow] and then do optimized matmul. Note that QR is a SuiteSparse factorization object returned by sparse qr decomposition and afaik is typically dense and stored in an optimized way to multiply with a vector. Any ideas on how to improve this further?

Sukera · September 1, 2021, 11:00am

What’s x and A?

Edit: In terms of how it’s constructed, what its type is etc. Makes a difference for checking which code is called

Sukera · September 1, 2021, 11:03am

That’s not a documented field of the object returned by qr(A), where does this come from?

Thomas · September 1, 2021, 11:04am

Sorry I am editing the original code as you asked

Edit: I might have found the correct bottleneck: even this y = QR.Q' * x is slow so the problem might not be due to getindex. I think I am computing a dense matrix vector multiplication.

Sukera · September 1, 2021, 11:16am

Sorry, I still have trouble constructing this x

Does this look correct for a MWE (Please provide a copy-pastable example next time )?

julia> A = sprand(10,20,0.5)                                  
10×20 SparseMatrixCSC{Float64, Int64} with 93 stored entries: 
⡙⠂⢍⣘⢪⣎⠴⢁⣹⡷                                                    
⠟⣅⢆⢀⣗⣠⡡⠨⡈⢬                                                    
⠈⠚⠘⠛⠂⠉⠉⠈⠊⠚                                                    
                                                              
julia> x = sprand(10, 0.5)                                    
10-element SparseVector{Float64, Int64} with 5 stored entries:
  [1 ]  =  0.766693                                           
  [2 ]  =  0.66149                                            
  [4 ]  =  0.385874                                           
  [8 ]  =  0.90622                                            
  [10]  =  0.931918

The x[QR.prow] allocates a bunch of new memory, which may be what slows you down here as I believe x is somewhat large. Sadly, using a view here doesn’t seem to help too much:

julia> @time y = QR.Q' * x[QR.prow];                
  0.000117 seconds (2.18 k allocations: 126.141 KiB)
                                                    
julia> @time y = QR.Q' * @view x[QR.prow];          
  0.000104 seconds (2.17 k allocations: 125.047 KiB)

I think at this point there’s some expert in terms of sparse matrices etc. needed

Nevertheless, doesn’t the suggestion at the end of the post you linked do what you want? This one:

Computing sparse orthogonal projections

Alternatively , realize that the intermediate step (A^A)^{-1} A^ x (A∗A)−1A∗x(A^A)^{-1} A^ x is simply the least-squares solution . So, your projection can also be computed by:
p = A * (A \ x)
since A \ x does least-squares (via QR) for a matrix with more rows than columns. Or, using a precomputed QR factorization:
QR = qr(A)
p = A * (QR \ x)

Thomas · September 1, 2021, 11:24am

I checked just compute x[QR.prow] is also slow.

Thanks for your suggestion. The suggested method works for full-rank case but I am developing an algorithm for rank-deficient case which requires the former method. I think A\x dispatch to qr(A)\x which computes a basic least square solution (not for rank-deficient case).

Also in some case A::Adjoint{<:Any, <:AbstractSparseMatrix} and due to memory issue I cannot materialize the adjoint (OutOfMemoryError with sparse A'*A) and has to work with orthogonal projection via A.parent.

Topic		Replies	Views
Matrix multiplication of a view of QR.Q with Vector{VariableRef} is slow Performance jump , linearalgebra , sparse	6	839	January 12, 2022
Computing sparse orthogonal projections Numerics linearalgebra	6	1996	June 10, 2021
OutOfMemoryError with sparse A'*A Performance performance , linearalgebra , sparse	9	1115	August 16, 2021
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4545	December 20, 2017
Performance of sparse matrix multiplied with a reshaped view New to Julia	2	389	August 13, 2020

Speed up matrix multiplication with permuted vector

Related topics