Batched Matrix solve in CUDA.jl

jpdoane · January 4, 2021, 10:49pm

Is there a way to perform batched matrix inversion/ldiv in CUDA.jl, ie:

Yi = Ri \ Xi

where Ri is NxN, and Yi and Xi are NxM, with N and M small (~12). I need to solve a large number of these systems, e.g. 1 < i < 50k. Obviously making multiple GPU calls in a loop will be highly inefficient for small N,M and CUDA provides several relevant batched methods, e.g. cusolverDnpotrsBatched() or cublasgetrsBatched().

I don’t believe the current CUDA.jl implementations of inv or ldiv support these batched calls. Is there a workaround or a roadmap for implementing this in CUDA.jl?

Kai_Xu · January 19, 2021, 12:52am

Is https://github.com/JuliaGPU/CUDA.jl/blob/master/test/cublas.jl#L1646 what you are looking for?

stillyslalom · January 19, 2021, 1:48am

Take a look at this discussion: Accelerate solving many matrix problems - #9 by clinton

RobertGregg · February 1, 2023, 7:00pm

Sorry to bring up old posts but I can across this when trying to solve a similar problem.

I want to make a quick update on the solution because the line:

 cuuplo = CUDA.CUBLAS.cublasfill('U')

in the hyperreg() gave a “not defined” error for me (UndefVarError). I replaced it with:

 cuuplo = CUDA.CUBLAS.CUBLAS_FILL_MODE_UPPER

to get it working again.

I also wanted to ask if this is still the go-to solution for batch Matrix solves on the GPU?

Topic		Replies	Views
Batch matrix/vector operations with CUDA.jl GPU question	5	424	September 4, 2024
Accelerate solving many matrix problems GPU cuda , linearalgebra , regression	8	2556	June 3, 2020
Batched LU solves (or Factorizations) with Sparse Matrices GPU	6	645	April 15, 2024
CUDA Matrix inverse GPU cuda , linearalgebra	13	6179	December 7, 2021
Large Tikhonov inverse of Matrix -> CUDA? General Usage cuda	0	245	September 9, 2022

Batched Matrix solve in CUDA.jl

Related topics