NeuralPDE on GPU throws NaNs when I use a source term elevated to some power

PetrosStefanou · May 9, 2024, 11:37am

Hello to everyone,

I have a problem with NeuralPDE when trying to use the GPU on the following problem

using NeuralPDE
using Optimization, OptimizationOptimJL, OptimizationOptimisers
import ModelingToolkit: Interval
using Lux, LuxCUDA, ComponentArrays, Random
using QuasiMonteCarlo


println("using CUDA: ", CUDA.functional())
move_to_gpu = gpu_device()

@parameters q μ
@variables P(..)

# Differential operators
∂q = Differential(q)
∂qq = Differential(q)^2
∂μμ = Differential(μ)^2

# Source term
Pc = 0.3405074
G(P) = 8 * (P - Pc)^3 * (P > Pc)

# Pc = 0.6
# G(P) = 4 * (P - Pc) * (P > Pc)

# G(P) = 0

# PDE
eq = 2 * q^3 * ∂q(P(q, μ)) + q^4 * ∂qq(P(q, μ)) +  q^2 * (1 - μ^2) * ∂μμ(P(q, μ)) ~ -G(P(q, μ))

# Domain limits
q_min = 0.0
q_max = 1.0
μ_min = -1.0
μ_max = 1.0

# Initial and boundary conditions
bcs = [
       P(q_min, μ) ~ 0.0
       , P(q_max, μ) ~ q_max * (1 - μ^2)
       , P(q, μ_min) ~ 0.0
       , P(q, μ_max) ~ 0.0
      ]

# Space and time domains
domains = [
           q ∈ Interval(q_min, q_max),
           μ ∈ Interval(μ_min, μ_max)
         #   ϕ ∈ Interval(ϕ_min, ϕ_max)
          ]
    

@named pde_system = PDESystem(eq, bcs, domains, [q, μ], [P(q, μ)])


inner = 20
chain = Chain(Dense(2, inner, tanh),
              Dense(inner, inner, tanh),
            #   Dense(inner, inner, tanh),
            #   Dense(inner, inner, tanh),
              Dense(inner, 1))

ps = Lux.setup(Random.default_rng(), chain)[1]
# ps = ps |> ComponentArray .|> Float64
ps = ps |> ComponentArray |> move_to_gpu .|> Float64
st = Lux.setup(Random.default_rng(), chain)[2]

iteration = Base.RefValue{Int}(1)

callback = function (p, l)
   
   if iteration[] % 100 == 0 || iteration[] == 1
      println("iteration ", iteration[], " loss: ", l)
   end
   
   iteration[] += 1

   return false
end

points = 1000
strategy = NeuralPDE.QuasiRandomTraining(points
                                         , bcs_points = points ÷ 4
                                         , sampling_alg = SobolSample()
                                        )

discretization = PhysicsInformedNN(chain
                                   , strategy
                                   , init_params = ps
                                  )

prob = discretize(pde_system, discretization)

res = Optimization.solve(prob, OptimizationOptimisers.Adam(0.05);
                         callback = callback, 
                         maxiters = 500)
prob = remake(prob, u0 = res.u)
res = Optimization.solve(prob, BFGS(); 
                         callback = callback, 
                         maxiters = 1000)

phi = discretization.phi

After the first iteration, the loss returns NaNs.
If I don’t use the GPU and run everything on CPU, the network trains as it should and I get the desired result.

My attempts to find a solution have led me to the conclusion that the problem lies in the source term which has a term elevated to some power.

# Source term
Pc = 0.3405074
G(P) = 8 * (P - Pc)^3 * (P > Pc)

# Pc = 0.6
# G(P) = 4 * (P - Pc) * (P > Pc)

If I use some other source term that is linear (the commented line) I get the desired result without NaNs.

I found this issue on github that seems to have the same problem but the solution suggested cannot work in my case.

Is there any alternative of the power operator that avoids this issue? Note that I have tried multiply (P-Pc) three times but I have the same problem.

Thank you in advance for your help.

PetrosStefanou · May 22, 2024, 10:44am

I have found the solution to the problem, in case anyone else encounters something similar in the future.

Replacing

(P - Pc)^3

with

abs(P - Pc)^3

does the trick for any exponent.
I am not sure why this is the case and I doubt that this solution is suitable for any use case, but that was the fix that I applied.

ganeshutah · July 6, 2025, 1:20am

Hi, I applied our binary instrumentation tool GPU-FPX (see my homepage at Utah [www dot cs dot utah dot edu / tilde ganesh] for a talk I recently gave at the ARRAY Workshop that describes it). It does shed some light into which kernels are involved in producing exceptions. I also ran your “fix”. It seems to produce an integer divide by zero after iteration 500. Did you see that? But the loss is “fixed” (no NaN) as you say. But even in this “fixed” version, there are NaNs being generated before Iteration 1 itself. Since I don’t know your code (I was merely looking for examples to try our code on), please email me at my email (in the webpage above) and we can perhaps develop not only an understanding of the issue but perhaps help improve our tool? Thank you. Ganesh. I won’t provide more than a few lines of what GPU-FPX produces below.

#GPU-FPX: Initializing GPU context…
No enable_kernels.txt !
No disable_kernels.txt !
#GPU-FPX: Instrument all kernels.
Running #GPU-FPX: kernel [gpu_broadcast_kernel_linear(CompilerMetadata] …
Running #GPU-FPX: kernel [gpu_fill_kernel_(CompilerMetadata] …
Running #GPU-FPX: kernel [gpu_linear_copy_kernel_(CompilerMetadata] …
Running #GPU-FPX: kernel [partial_mapreduce_grid(identity, _, Bool, CartesianIndices] …
Running #GPU-FPX: kernel [gpu_getindex_kernel(CompilerMetadata] …
Running #GPU-FPX: kernel [gpu_setindex_kernel(CompilerMetadata] …
Running #GPU-FPX: kernel [gpu_broadcast_kernel_cartesian(CompilerMetadata] …
#GPU-FPX-ANA SHARED REGISTER: Before executing the instruction @ ./int.jl:295 Instruction: MUFU.RCP64H R21, R19 ; We h
ave 4 registers in total. Register 0 is VAL. Register 1 is NaN. Register 2 is VAL. Register 3 is VAL.

#GPU-FPX-ANA SHARED REGISTER: Before executing the instruction @ ./float.jl:491 Instruction: DFMA R12, RZ, R12, R16 ;
We have 4 registers in total. Register 0 is VAL. Register 1 is NaN. Register 2 is VAL. Register 3 is VAL.

#GPU-FPX-ANA SHARED REGISTER: After executing the instruction @ ./float.jl:491 Instruction: DFMA R12, RZ, R12, R16 ; W
e have 4 registers in total. Register 0 is VAL. Register 1 is NaN. Register 2 is VAL. Register 3 is VAL.

…more omitted…

Topic		Replies	Views
Trouble with NeuralPDE with ADAM optimizer on GPU Modelling & Simulations diffeq , optimization , neural-network	5	954	December 26, 2021
Power of a given value into CUDA.jl kernel GPU question , cuda	2	749	September 5, 2020
Flux.jl: training fails at GPU but works on CPU Machine Learning gpu , flux	1	642	September 19, 2019
Having issues with running NeuralPDE on GPU GPU question , gpuarrays , sciml , neural-network	5	335	January 22, 2024
DiffEqFlux neural_ode used with Flux.Train! is slower on GPU than CPU GPU flux , ode	0	737	December 31, 2019

NeuralPDE on GPU throws NaNs when I use a source term elevated to some power

Related topics