How to resolve a nonlinear equations system in GPU (parallel)

Hello, I’m new using Julia and I’m trying to solve a large system of nonlinear equations (>768 equations, >200 unknowns). I was reading the documentation for NonlinearSolve.jl, and although it mentioned that for solvers or independently of them, it was possible to use automatic differentiation backends that are compatible with GPUs. So, I was wondering if it’s truly possible to parallelize this system of equations. I haven’t found an explicit example where a system of equations runs in parallel, and although I tried to use this example, it didn’t work once I introduced several equations that don’t have the same form. Does anyone have an example I could follow.

using NonlinearSolve
using CUDA

function quadratric!(du,u,p)
@. du = u^2 - p
return nothing
end

u0=rand(2)

prob=NonlinearProblem(quadratric!, cu(u0), 2.0)

@time solve(prob)

Or does anyone know of a method that would be better suited for this system?

That code there should just work. Can you share the error?

@avikpal do you have an example code to point to?

I think the OP tried with a custom example, not the one above. In which case I am assuming there was an issue with writing the function in a GPU compatible manner.

don’t have the same form

It is worthwhile to note that GPU threads need to run a single program across your data, so some sort of a shared structure is needed.

Thanks for your reply.

That example worked, but I don’t know if it’s me or what but when I try to pass a system of equations with several equations to it, like

function my_system!(du, u, p)

du1 = u[1]^2 - p[1]
du2 = u[2] * u[3] + p[2]
du3 = CUDA.sin(u[1]) * CUDA.cos(u[2])

du=[du1, du2, du3]

return nothing

end

it just dont work.

I tried to use an example I saw in the DifferentialEquations.jl module to write it in a way that could be passed to the GPU (since from what I had read, the way the commands are declared in the SciML environment should make them compatible) but that didn’t work either.

the code I tried:

using CUDA
using NonlinearSolve
using ForwardDiff
using Zygote

mutable struct prueba <: Function

d_F1::CuArray{Float32, 1}    # intracellular calcium concentration
d_F2::CuArray{Float32, 1}    # sodium current activation gate (m)
d_F3::CuArray{Float32, 1}    # sodium current inactivation gate (h)
d_x::CuArray{Float32, 1}    # sodium current inactivation gate (h)
d_u::CuArray{Float64, 1}   # place-holder for d_u in the device memory

function prueba(u0, p)
    self = new()

    nx=size(u0)

    #@assert (nx % 16 == 0)

    self.d_F1 = CuArray(fill(0.0001f0, (nx)))
    self.d_F2 = CuArray(fill(0.01f0, (nx)))
    self.d_F3 = CuArray(fill(0.988f0, (nx)))
    self.d_x = CuArray(fill(0.988f0, (nx)))
    self.d_u = CuArray(u0)

    return self
end

end

function system1(x, p)
F = CUDA.sin(x[1]) + x[2]^2 + CUDA.exp(x[3]) - 7
return F
end

function system2(x,p)
F = 3x[1] + 2x[2] - x[3]^2 - 4
return F
end

function system3(x,p)
F = x[1]^2 + x[2]^2 + x[3]^2 - 1
return F
end

function core(u,F1,F2,F3,x,p)
i = (blockIdx().x - UInt32(1)) * blockDim().x + threadIdx().x
#j = (blockIdx().y - UInt32(1)) * blockDim().y + threadIdx().y
#(x1,x2,x3)=x

F1[i]=system1(x[i],p)
F2[i]=system2(x[i],p)
F3[i]=system3(x[i],p)

nothing

end

function (f::prueba)(du,u,p)
L = 16*128 # block size
copyto!(f.d_u, u)
nx = size(u)[1]

#@cuda blocks=(nx ÷ L) threads=(L ÷ 128) core(f.d_u, 0,f.d_F1, f.d_F2,f.d_F3)
@cuda blocks=(32) threads=(128) core(f.d_u,f.d_F1, f.d_F2,f.d_F3, f.d_x,2)

end

u0=zeros(3)
deriv_gpu = prueba(u0, 1.0);
prob = NonlinearProblem(deriv_gpu, u0, );
#ODEProblem(deriv_gpu, u0, (0.0, 50.0))
@time sol = solve(prob, LevenbergMarquardt(; autodiff=AutoForwardDiff()) );

In all the cases and various ways I’ve tried, I never know what I’m doing wrong :c