How to resolve a nonlinear equations system in GPU (parallel)

Juan_jose · August 15, 2024, 3:31pm

Hello, I’m new using Julia and I’m trying to solve a large system of nonlinear equations (>768 equations, >200 unknowns). I was reading the documentation for NonlinearSolve.jl, and although it mentioned that for solvers or independently of them, it was possible to use automatic differentiation backends that are compatible with GPUs. So, I was wondering if it’s truly possible to parallelize this system of equations. I haven’t found an explicit example where a system of equations runs in parallel, and although I tried to use this example, it didn’t work once I introduced several equations that don’t have the same form. Does anyone have an example I could follow.

using NonlinearSolve
using CUDA

function quadratric!(du,u,p)
@. du = u^2 - p
return nothing
end

u0=rand(2)

prob=NonlinearProblem(quadratric!, cu(u0), 2.0)

@time solve(prob)

Or does anyone know of a method that would be better suited for this system?

ChrisRackauckas · August 17, 2024, 11:56am

That code there should just work. Can you share the error?

@avikpal do you have an example code to point to?

avikpal · August 17, 2024, 8:18pm

I think the OP tried with a custom example, not the one above. In which case I am assuming there was an issue with writing the function in a GPU compatible manner.

don’t have the same form

It is worthwhile to note that GPU threads need to run a single program across your data, so some sort of a shared structure is needed.

Juan_jose · August 18, 2024, 9:41pm

Thanks for your reply.

That example worked, but I don’t know if it’s me or what but when I try to pass a system of equations with several equations to it, like

function my_system!(du, u, p)

du1 = u[1]^2 - p[1]
du2 = u[2] * u[3] + p[2]
du3 = CUDA.sin(u[1]) * CUDA.cos(u[2])

du=[du1, du2, du3]

return nothing

end

it just dont work.

I tried to use an example I saw in the DifferentialEquations.jl module to write it in a way that could be passed to the GPU (since from what I had read, the way the commands are declared in the SciML environment should make them compatible) but that didn’t work either.

the code I tried:

using CUDA
using NonlinearSolve
using ForwardDiff
using Zygote

mutable struct prueba <: Function

d_F1::CuArray{Float32, 1}    # intracellular calcium concentration
d_F2::CuArray{Float32, 1}    # sodium current activation gate (m)
d_F3::CuArray{Float32, 1}    # sodium current inactivation gate (h)
d_x::CuArray{Float32, 1}    # sodium current inactivation gate (h)
d_u::CuArray{Float64, 1}   # place-holder for d_u in the device memory

function prueba(u0, p)
    self = new()

    nx=size(u0)

    #@assert (nx % 16 == 0)

    self.d_F1 = CuArray(fill(0.0001f0, (nx)))
    self.d_F2 = CuArray(fill(0.01f0, (nx)))
    self.d_F3 = CuArray(fill(0.988f0, (nx)))
    self.d_x = CuArray(fill(0.988f0, (nx)))
    self.d_u = CuArray(u0)

    return self
end

end

function system1(x, p)
F = CUDA.sin(x[1]) + x[2]^2 + CUDA.exp(x[3]) - 7
return F
end

function system2(x,p)
F = 3x[1] + 2x[2] - x[3]^2 - 4
return F
end

function system3(x,p)
F = x[1]^2 + x[2]^2 + x[3]^2 - 1
return F
end

function core(u,F1,F2,F3,x,p)
i = (blockIdx().x - UInt32(1)) * blockDim().x + threadIdx().x
#j = (blockIdx().y - UInt32(1)) * blockDim().y + threadIdx().y
#(x1,x2,x3)=x

F1[i]=system1(x[i],p)
F2[i]=system2(x[i],p)
F3[i]=system3(x[i],p)

nothing

end

function (f::prueba)(du,u,p)
L = 16*128 # block size
copyto!(f.d_u, u)
nx = size(u)[1]

#@cuda blocks=(nx ÷ L) threads=(L ÷ 128) core(f.d_u, 0,f.d_F1, f.d_F2,f.d_F3)
@cuda blocks=(32) threads=(128) core(f.d_u,f.d_F1, f.d_F2,f.d_F3, f.d_x,2)

end

u0=zeros(3)
deriv_gpu = prueba(u0, 1.0);
prob = NonlinearProblem(deriv_gpu, u0, );
#ODEProblem(deriv_gpu, u0, (0.0, 50.0))
@time sol = solve(prob, LevenbergMarquardt(; autodiff=AutoForwardDiff()) );

In all the cases and various ways I’ve tried, I never know what I’m doing wrong :c

ChrisRackauckas · May 29, 2025, 12:05am

There is a new tutorial on this Accelerated Rootfinding on GPUs · NonlinearSolve.jl

Topic		Replies	Views
Would it be possible to use multiple CPU cores in Solving Nonlinear Equations? General Usage question , nonlinearsolve	6	322	April 22, 2024
Solving ODE on GPU from Python with DifferentialEquations.jl New to Julia gpu , python , differentialequation	11	1003	May 29, 2025
My function works on it's own but not in NonLinearSolve General Usage forwarddiff , nonlinearsolve	10	484	April 22, 2024
BoundsError when solving non linear system of equations General Usage	7	1262	October 24, 2021
DIfferentialEquations and GPU Performance gpu	15	4989	December 7, 2020

How to resolve a nonlinear equations system in GPU (parallel)

Related topics