Hello to everyone,

I have a problem with NeuralPDE when trying to use the GPU on the following problem

```
using NeuralPDE
using Optimization, OptimizationOptimJL, OptimizationOptimisers
import ModelingToolkit: Interval
using Lux, LuxCUDA, ComponentArrays, Random
using QuasiMonteCarlo
println("using CUDA: ", CUDA.functional())
move_to_gpu = gpu_device()
@parameters q μ
@variables P(..)
# Differential operators
∂q = Differential(q)
∂qq = Differential(q)^2
∂μμ = Differential(μ)^2
# Source term
Pc = 0.3405074
G(P) = 8 * (P - Pc)^3 * (P > Pc)
# Pc = 0.6
# G(P) = 4 * (P - Pc) * (P > Pc)
# G(P) = 0
# PDE
eq = 2 * q^3 * ∂q(P(q, μ)) + q^4 * ∂qq(P(q, μ)) + q^2 * (1 - μ^2) * ∂μμ(P(q, μ)) ~ -G(P(q, μ))
# Domain limits
q_min = 0.0
q_max = 1.0
μ_min = -1.0
μ_max = 1.0
# Initial and boundary conditions
bcs = [
P(q_min, μ) ~ 0.0
, P(q_max, μ) ~ q_max * (1 - μ^2)
, P(q, μ_min) ~ 0.0
, P(q, μ_max) ~ 0.0
]
# Space and time domains
domains = [
q ∈ Interval(q_min, q_max),
μ ∈ Interval(μ_min, μ_max)
# ϕ ∈ Interval(ϕ_min, ϕ_max)
]
@named pde_system = PDESystem(eq, bcs, domains, [q, μ], [P(q, μ)])
inner = 20
chain = Chain(Dense(2, inner, tanh),
Dense(inner, inner, tanh),
# Dense(inner, inner, tanh),
# Dense(inner, inner, tanh),
Dense(inner, 1))
ps = Lux.setup(Random.default_rng(), chain)[1]
# ps = ps |> ComponentArray .|> Float64
ps = ps |> ComponentArray |> move_to_gpu .|> Float64
st = Lux.setup(Random.default_rng(), chain)[2]
iteration = Base.RefValue{Int}(1)
callback = function (p, l)
if iteration[] % 100 == 0 || iteration[] == 1
println("iteration ", iteration[], " loss: ", l)
end
iteration[] += 1
return false
end
points = 1000
strategy = NeuralPDE.QuasiRandomTraining(points
, bcs_points = points ÷ 4
, sampling_alg = SobolSample()
)
discretization = PhysicsInformedNN(chain
, strategy
, init_params = ps
)
prob = discretize(pde_system, discretization)
res = Optimization.solve(prob, OptimizationOptimisers.Adam(0.05);
callback = callback,
maxiters = 500)
prob = remake(prob, u0 = res.u)
res = Optimization.solve(prob, BFGS();
callback = callback,
maxiters = 1000)
phi = discretization.phi
```

After the first iteration, the loss returns NaNs.

If I don’t use the GPU and run everything on CPU, the network trains as it should and I get the desired result.

My attempts to find a solution have led me to the conclusion that the problem lies in the source term which has a term elevated to some power.

```
# Source term
Pc = 0.3405074
G(P) = 8 * (P - Pc)^3 * (P > Pc)
# Pc = 0.6
# G(P) = 4 * (P - Pc) * (P > Pc)
```

If I use some other source term that is linear (the commented line) I get the desired result without NaNs.

I found this issue on github that seems to have the same problem but the solution suggested cannot work in my case.

Is there any alternative of the power operator that avoids this issue? Note that I have tried multiply (P-Pc) three times but I have the same problem.

Thank you in advance for your help.