 # Cuda cg + ilu?

Hi all,
I’m looking to use a GPU CG method with an ILU preconditioner. I’m currently using a CPU CG method through IterativeSolvers and ILU through IncompleteLU.jl . The CG method from IterativeSolvers is easily extended to GPUs however, some work is required on IncompleteLU.jl to implement it on a GPU. I think only `forward_substitution` and `backward_substitution` methods are required to extend this. I’ve tried using `sv2!` and `ilu02` with little success. Does anyone have any ideas?

do you need the ILU on GPU? You could just pass the decomposition to the GPU, see Complex Ginzburg-Landau 2d · Bifurcation Analysis in Julia

The issue is that the preconditioner requires the ‘ldiv!’ method as per https://julialinearalgebra.github.io/IterativeSolvers.jl/dev/preconditioning/. Unless I am mistaken?

The link you sent does look helpful though. Thank you for that

Here is some code:

``````using CUDA, CUDA.CUSPARSE, CUDA.CUSOLVER
using LinearAlgebra
using SparseArrays
using IncompleteLU
using IterativeSolvers
CUDA.allowscalar(false)

val = sprand(200,200,0.05);
A_cpu = val*val'
b_cpu = rand(200)

A_gpu = CuSparseMatrixCSC(A_cpu)
b_gpu = CuArray(b_cpu)

Precilu = ilu(A_cpu, τ = 3.0)

import Base: ldiv!
function LinearAlgebra.ldiv!(y::CuArray, P::LUgpu, x::CuArray)
copyto!(y, x)
sv2!('N', 'L', 1.0, P, y, 'O')
sv2!('N', 'U', 1.0, P, y, 'O')
return y
end

function LinearAlgebra.ldiv!(P::LUgpu, x::CuArray)
sv2!('N', 'L', 1.0, P, x, 'O')
sv2!('N', 'U', 1.0, P, x, 'O')
return x
end

# function LinearAlgebra.ldiv!(_lu::LUperso, rhs::CUDA.CuArray)
# 	_x = UpperTriangular(_lu.Ut) \ (LowerTriangular(_lu.L) \ rhs)
# 	rhs .= vec(_x)
# 	# CUDA.unsafe_free!(_x)
# 	rhs
# end
#
# function LinearAlgebra.ldiv!(yrhs::CuArray,_lu::LUperso, rhs::CuArray)
# 	copyto!(yrhs,rhs)
# 	_x = UpperTriangular(_lu.Ut) \ (LowerTriangular(_lu.L) \ rhs)
# 	rhs .= vec(_x)
# 	# CUDA.unsafe_free!(_x)
# 	rhs
# end

struct LUgpu
L
Ut	# transpose of U in LU decomposition
end

P = LUperso(LowerTriangular(CuSparseMatrixCSR(I+Precilu.L)), UpperTriangular(CuSparseMatrixCSR(sparse(Precilu.U'))));
val = cg(A_gpu,b_gpu,verbose=true,Pl=P,tol=10^-7,maxiter=1000)

P_cpu = ilu(A_cpu,τ=3.0)
val = cg(A_cpu,b_cpu;verbose=true,Pl=P_cpu,tol=10^-7,maxiter=1000)
A_cpu\b_cpu
``````