Batched LU solves (or Factorizations) with Sparse Matrices

Let’s say I have a sparse set of matrices A1, A2, ... An, each of type CuSparseMatrixCSC{Float32, Int32} along with associated RHS vectors b1, b2, ... bn. Is there some watch to do a batch LU solve on the GPU with CUDA.jl?

If the matrices are dense, getrf_batched! works fine to get the LU factors. Is there some way to do batch solves/factorizations with sparse matrices @maleadt?

how big are the a matrices? you might just want to treat them as dense

The matrices are large network matrices (e.g., 10k x 10k), and they are extremely sparse (e.g., 99.99%). So leaving them dense is not an option, sadly!

1 Like

LinearSolve.jl’s SimpleGMRES has optimizations for batched solving and is compatible with GPUs. That’s likely to be the best option here, though if you can supply a good preconditioner that would be helpful.

Thanks @ChrisRackauckas! Big fan of LinearSolve.jl. Unfortunately, I am solving linear systems within the context of a nonlinear optimization problem (interior point method), where the associated matrices sequentially have worse and worse condition numbers as convergence is approached (10^20 even). Thus, iterative methods (GMRES et al.) are well-known to cause slow convergence or divergence, so they are not a great choice here. If there is a GPU-batched LU/QR solver in LinearSolve.jl, I would love to hear about it. If not, maybe I can help develop it in the future.

I see. MKLPardisoFactorize should do batching I think?

@schev I suggest to try CUDSS.jl (GitHub - exanauts/CUDSS.jl).

It provides an interface to the new sparse linear solvers of NVIDIA.

They don’t provide a routine for solving batched sparse linear systems but you can create one big sparse block diagonal matrix diag(A1, A2, …, An) with the right-hand side [b1; b2; …; bn].

1 Like