NeuralPDE on GPU throws NaNs when I use a source term elevated to some power

Hello to everyone,

I have a problem with NeuralPDE when trying to use the GPU on the following problem

using NeuralPDE
using Optimization, OptimizationOptimJL, OptimizationOptimisers
import ModelingToolkit: Interval
using Lux, LuxCUDA, ComponentArrays, Random
using QuasiMonteCarlo


println("using CUDA: ", CUDA.functional())
move_to_gpu = gpu_device()

@parameters q μ
@variables P(..)

# Differential operators
∂q = Differential(q)
∂qq = Differential(q)^2
∂μμ = Differential(μ)^2

# Source term
Pc = 0.3405074
G(P) = 8 * (P - Pc)^3 * (P > Pc)

# Pc = 0.6
# G(P) = 4 * (P - Pc) * (P > Pc)

# G(P) = 0

# PDE
eq = 2 * q^3 * ∂q(P(q, μ)) + q^4 * ∂qq(P(q, μ)) +  q^2 * (1 - μ^2) * ∂μμ(P(q, μ)) ~ -G(P(q, μ))

# Domain limits
q_min = 0.0
q_max = 1.0
μ_min = -1.0
μ_max = 1.0

# Initial and boundary conditions
bcs = [
       P(q_min, μ) ~ 0.0
       , P(q_max, μ) ~ q_max * (1 - μ^2)
       , P(q, μ_min) ~ 0.0
       , P(q, μ_max) ~ 0.0
      ]

# Space and time domains
domains = [
           q ∈ Interval(q_min, q_max),
           μ ∈ Interval(μ_min, μ_max)
         #   ϕ ∈ Interval(ϕ_min, ϕ_max)
          ]
    

@named pde_system = PDESystem(eq, bcs, domains, [q, μ], [P(q, μ)])


inner = 20
chain = Chain(Dense(2, inner, tanh),
              Dense(inner, inner, tanh),
            #   Dense(inner, inner, tanh),
            #   Dense(inner, inner, tanh),
              Dense(inner, 1))

ps = Lux.setup(Random.default_rng(), chain)[1]
# ps = ps |> ComponentArray .|> Float64
ps = ps |> ComponentArray |> move_to_gpu .|> Float64
st = Lux.setup(Random.default_rng(), chain)[2]

iteration = Base.RefValue{Int}(1)

callback = function (p, l)
   
   if iteration[] % 100 == 0 || iteration[] == 1
      println("iteration ", iteration[], " loss: ", l)
   end
   
   iteration[] += 1

   return false
end

points = 1000
strategy = NeuralPDE.QuasiRandomTraining(points
                                         , bcs_points = points ÷ 4
                                         , sampling_alg = SobolSample()
                                        )

discretization = PhysicsInformedNN(chain
                                   , strategy
                                   , init_params = ps
                                  )

prob = discretize(pde_system, discretization)

res = Optimization.solve(prob, OptimizationOptimisers.Adam(0.05);
                         callback = callback, 
                         maxiters = 500)
prob = remake(prob, u0 = res.u)
res = Optimization.solve(prob, BFGS(); 
                         callback = callback, 
                         maxiters = 1000)

phi = discretization.phi

After the first iteration, the loss returns NaNs.
If I don’t use the GPU and run everything on CPU, the network trains as it should and I get the desired result.

My attempts to find a solution have led me to the conclusion that the problem lies in the source term which has a term elevated to some power.

# Source term
Pc = 0.3405074
G(P) = 8 * (P - Pc)^3 * (P > Pc)

# Pc = 0.6
# G(P) = 4 * (P - Pc) * (P > Pc)

If I use some other source term that is linear (the commented line) I get the desired result without NaNs.

I found this issue on github that seems to have the same problem but the solution suggested cannot work in my case.

Is there any alternative of the power operator that avoids this issue? Note that I have tried multiply (P-Pc) three times but I have the same problem.

Thank you in advance for your help.

I have found the solution to the problem, in case anyone else encounters something similar in the future.

Replacing

(P - Pc)^3

with

abs(P - Pc)^3

does the trick for any exponent.
I am not sure why this is the case and I doubt that this solution is suitable for any use case, but that was the fix that I applied.