Hello,
I am pretty new to GPU programming in general, so apologies in advanced for sounding clueless. I have a problem where I need to find the inverses of a bunch of relatively small matrices on a GPU using LU factorization. I’ve been using the CUDA package but have run into an issue running the batched version of the getrf and getrs CUDA wrapper.
I have been able to get the non batched version to work fine (thanks to an example on Google, code included).
m = rand(100, 100) # Example 100x100 matrix
A = CuArray(m)
B = CuArray(Matrix{eltype(A)}(I(size(A,1))))
A_factored, ipiv = CUDA.CUSOLVER.getrf!(A)
inverse_matrix_gpu = Matrix{eltype(A)}(CUDA.CUSOLVER.getrs!('N', A_factored, ipiv, B))
This works fine for me. However, when I need to run some sort of batched version on an nxnxm matrix. I figured CUDA.CUBLAS.getrf_batched! would be the go to way to go about it, but… well, I have no idea how to get it to work. I figured the inputs would be similar, but they are not and require additional inputs that aren’t clearly outlined in the documentation or code (at least that I can find). As an example, I when I try to pass a 3 dimensional CuArray:
A = CUDA.rand(100,100,1024)
CUDA.CUBLAS.getrf_batched!(A)
I end up with this error:
ERROR: MethodError: no method matching getrf_batched!(::CuArray{Float32, 3, CUDA.DeviceMemory})
The function `getrf_batched!` exists, but no method is defined for this combination of argument types.
Closest candidates are:
getrf_batched!(::Any, ::CuArray{CuPtr{ComplexF32}, 1}, ::Any, ::TP, ::TI) where {TP<:Union{Bool, CuArray{<:Any, 2}}, TI<:Union{Nothing, CuArray{Int32}}}
@ CUDA C:\Users\user\.julia\packages\CUDA\1kIOw\lib\cublas\wrappers.jl:1938
getrf_batched!(::Any, ::CuArray{CuPtr{ComplexF64}, 1}, ::Any, ::TP, ::TI) where {TP<:Union{Bool, CuArray{<:Any, 2}}, TI<:Union{Nothing, CuArray{Int32}}}
@ CUDA C:\Users\user\.julia\packages\CUDA\1kIOw\lib\cublas\wrappers.jl:1938
getrf_batched!(::Any, ::CuArray{CuPtr{Float32}, 1}, ::Any, ::TP, ::TI) where {TP<:Union{Bool, CuArray{<:Any, 2}}, TI<:Union{Nothing, CuArray{Int32}}}
@ CUDA C:\Users\user\.julia\packages\CUDA\1kIOw\lib\cublas\wrappers.jl:1938
So, clearly methods exist for this, but I just have no idea how to actually set things up properly. If anyone knows how to properly set up this functionality or can point me towards the appropriate documentation, their help would be greatly appreciated.
Thanks.