Using getrf_batched to find matrix inverses

JackC · August 7, 2025, 1:58pm

Hello,

I am pretty new to GPU programming in general, so apologies in advanced for sounding clueless. I have a problem where I need to find the inverses of a bunch of relatively small matrices on a GPU using LU factorization. I’ve been using the CUDA package but have run into an issue running the batched version of the getrf and getrs CUDA wrapper.

I have been able to get the non batched version to work fine (thanks to an example on Google, code included).

m = rand(100, 100) # Example 100x100 matrix
A = CuArray(m)

B = CuArray(Matrix{eltype(A)}(I(size(A,1))))
A_factored, ipiv = CUDA.CUSOLVER.getrf!(A)

inverse_matrix_gpu = Matrix{eltype(A)}(CUDA.CUSOLVER.getrs!('N', A_factored, ipiv, B))

This works fine for me. However, when I need to run some sort of batched version on an nxnxm matrix. I figured CUDA.CUBLAS.getrf_batched! would be the go to way to go about it, but… well, I have no idea how to get it to work. I figured the inputs would be similar, but they are not and require additional inputs that aren’t clearly outlined in the documentation or code (at least that I can find). As an example, I when I try to pass a 3 dimensional CuArray:

A = CUDA.rand(100,100,1024)

CUDA.CUBLAS.getrf_batched!(A)

I end up with this error:

ERROR: MethodError: no method matching getrf_batched!(::CuArray{Float32, 3, CUDA.DeviceMemory})
The function `getrf_batched!` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  getrf_batched!(::Any, ::CuArray{CuPtr{ComplexF32}, 1}, ::Any, ::TP, ::TI) where {TP<:Union{Bool, CuArray{<:Any, 2}}, TI<:Union{Nothing, CuArray{Int32}}}
   @ CUDA C:\Users\user\.julia\packages\CUDA\1kIOw\lib\cublas\wrappers.jl:1938
  getrf_batched!(::Any, ::CuArray{CuPtr{ComplexF64}, 1}, ::Any, ::TP, ::TI) where {TP<:Union{Bool, CuArray{<:Any, 2}}, TI<:Union{Nothing, CuArray{Int32}}}
   @ CUDA C:\Users\user\.julia\packages\CUDA\1kIOw\lib\cublas\wrappers.jl:1938
  getrf_batched!(::Any, ::CuArray{CuPtr{Float32}, 1}, ::Any, ::TP, ::TI) where {TP<:Union{Bool, CuArray{<:Any, 2}}, TI<:Union{Nothing, CuArray{Int32}}}
   @ CUDA C:\Users\user\.julia\packages\CUDA\1kIOw\lib\cublas\wrappers.jl:1938

So, clearly methods exist for this, but I just have no idea how to actually set things up properly. If anyone knows how to properly set up this functionality or can point me towards the appropriate documentation, their help would be greatly appreciated.

Thanks.

simeonschaub · August 7, 2025, 2:11pm

For an nxnxm array, you want the strided batched version of getrf! (unfortunately, internally cuBLAS doesn’t have an analog for this though, so it will still allocate an array of device pointers). See the tests for an example of how to use it, the easiest way is just to pass true for the pivot:

github.com/JuliaGPU/CUDA.jl

test/libraries/cublas/extensions.jl

205c238e5


      
          
          @testset "getrf_strided_batched!" begin
              Random.seed!(1)
              local k
              # generate strided matrix
              A = rand(elty,m,m,10)
              # move to device
              d_A = CuArray(A)
              # testing without pivoting quickly results in inaccuracies along the diagonal
              # test with pivoting
              pivot, info = CUBLAS.getrf_strided_batched!(d_A, true)
              h_info = Array(info)
              h_pivot = Array(pivot)
              for As in 1:size(d_A, 3)
                  C   = lu(A[:,:,As])
                  h_A = Array(d_A[:,:,As])
                  #reconstruct L,U
                  dL = Matrix(one(elty)*I, m, m)
                  dU = zeros(elty,(m,m))
                  k = h_info[As]
                  if( k >= 0 )

JackC · August 7, 2025, 3:45pm

Ah perfect, I was able to do the inversion with a similar set up with getrs_strided_batched!. Thanks!

Topic		Replies	Views
Batched Matrix solve in CUDA.jl GPU blas	3	1687	February 1, 2023
CUDA Matrix inverse GPU cuda , linearalgebra	13	6378	December 7, 2021
Batch matrix/vector operations with CUDA.jl GPU question	5	580	September 4, 2024
Accelerate solving many matrix problems GPU cuda , linearalgebra , regression	8	2641	June 3, 2020
How to call blas getrf, getri properly ? i want to create a benchmark inverse matrix using gauss, crout, native julia inv(A) and BLAS direct New to Julia question , blas , mkl , benchmark , openblas	5	163	May 7, 2025

Using getrf_batched to find matrix inverses

Related topics