# Loss function in NeuralPDE.jl

Hi,
I’m trying to train a PINN using NeuralPDE.jl. I was able to start the training and achieved (at least in my opinion) a quiet small loss < 1e-7. But when I plotted the network solution against the analytic solution the two results looked very different.

That’s why I wanted to understand the loss function. Sadly I was not able to find anything about how it is definded, except that it is a combination of the boundary condition loss and the loss for the pde. So I would be very happy if someone here could help me understand how the loss is defined.

For you to better understand the problem, I have prepared my minimal working example:

using Lux, NeuralPDE, OptimizationOptimisers, Plots, Random
import ModelingToolkit: Interval

@parameters t
@variables u(..)

Dt = Differential(t)
eq = Dt(u(t)) ~ (u(t))^2 - (u(t))^3

δ = 0.01
bcs = [u(0.0) ~ δ]
domains = [t ∈ Interval(0.0, 2 / δ)]

chain = Lux.Chain(Dense(1, 8, Lux.σ), Dense(8, 8, Lux.σ), Dense(8, 1))

initParams, initState = Lux.setup(Random.default_rng(), chain)

discretization = PhysicsInformedNN(chain, QuasiRandomTraining(1000))
@named pde_system = PDESystem(eq, bcs, domains, [t], [u(t)])
prob = discretize(pde_system, discretization)

callback = function (params, loss_val)
ps, st = Lux.setup(Random.default_rng(), chain)
y, st = Lux.apply(chain, [0.5], params, st)
println("loss = ", loss_val)

# stop optimization when loss value is smaller than 1e-7
return loss_val < 1e-7
end

params_trained = solve(prob, optimiser, callback=callback, maxiters=2000, save_best=true)
state_trained = discretization.phi

analytic_sol_func(t) = 1 / (lambertw((1 / δ - 1) * exp(1 / δ - 1 - t)) + 1)

# plot result
dx = 0.001
xs = [infimum(d.domain):(dx/10):supremum(d.domain) for d in domains][1]
u_real = analytic_sol_func.(xs)
state_with_params(x) = state_trained(x, params_trained.u)
u_predict = first.(state_with_params.(xs))

x_plot = collect(xs)
plot(x_plot, u_real, label="real")
plot!(x_plot, u_predict, label="predict")


And here is an image of the final plot:

The blue line in the image is the analytical solution and the red/orange line represents the output of the network. And as you can see those two differ quiet much.

Hi @Vivi,

if you also compute symbolic_discretize(pde_system, discretization), you can observe the created loss functions in the returned symbolic problem structure.

Thank you very much @drsk.

I took a look in the output of the symbolic problem structure. As I’m quite new I don’t really understand how I can call this loss function to gain better understanding i.e. with the debugger. My idea was to look if the loss is nearly 0 when I put in the analytic solution.

For example I looked at the output of symbolic_discretize(pde_system, discretization).loss_functions.pde_loss_functions which is

NeuralPDE.var"#92#95"{NeuralPDE.var"#219#220"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:cord, Symbol("##θ#293"), :phi, :derivative, :integral, :u, :p), NeuralPDE.var"#_RGF_ModTag", NeuralPDE.var"#_RGF_ModTag", (0x4842fd3a, 0xfdea5419, 0x88b3ff40, 0x144fdfca, 0xc206fec8), Expr}, NeuralPDE.var"#12#13", NeuralPDE.var"#279#286"{NeuralPDE.var"#279#280#287"{typeof(NeuralPDE.numeric_derivative)}, Dict{Symbol, Int64}, Dict{Symbol, Int64}, QuasiRandomTraining}, typeof(NeuralPDE.numeric_derivative), NeuralPDE.Phi{Chain{NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{Dense{true, typeof(sigmoid_fast), typeof(glorot_uniform), typeof(zeros32)}, Dense{true, typeof(sigmoid_fast), typeof(glorot_uniform), typeof(zeros32)}, Dense{true, typeof(identity), typeof(glorot_uniform), typeof(zeros32)}}}, Nothing}, NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}}}}, Nothing}, Tuple{Vector{Float64}, Vector{Float64}}, Int64, Qua…


I assume the first parameters like cord, phi,… are the input arguments to this loss function, but what do they represent?

Or asked the other way around, as I want to learn this simple ODE function, does anyone has tips why my results are so bad. Is the loss function really the problem or am I missing something else?

What does the output of the solver say? You should see the actual loss printed for every iteration. Your solver stops after a maximum of 2000 iterations, hence it could be that you haven’t converged yet. Also note that the loss function depends on the actual training strategy used.

Hi, sorry for the late response, I didn’t have time for this project. But now I’m back. I do not totally understand what you mean by solver output. I print in the callback the loss value for every iteration and this is quite small. If you mean something else, I would really appreciate if you could explain it a bit more.

I also did a training with more iterations in my case 12 000 and the loss is 5e-9 (I changed the return in the callback to return false). Do you suggest to do even more iterations for convergence? And if yes do you have a suggestion on how many I need approximately?

For the training strategy, I think I use QuasiRandomTraining.

I meant indeed the output of your loss function during training. Did you try different strategies? The loss function of QuasiRandomTraining is mean(abs2, loss_function(sets__, θ)), where the inner loss function is computed from your equation. But this depends also on how you sample. In your example, the loss seems zero almost everywhere outside [70, 110] and concentrated on a fairly small interval. I’d try a QuadratureTraining and see if you get a better result.

Thank you, I will try different strategies the next days.
I already tried QuadratureTraining, but the results are nearly the same as before.
I have the feeling, that my boundary condition is not considered in the loss evaluation and I think that’s the problem for those bad results, as f(t) = 0 would be a solution when ignoring the boundary condition. To try that I set the weights for the equation and the boundary condition (pde_loss_weights, bc_loss_weigths) and even when the bc_loss_weigths is much higher, the boundary condition seems not be considered. I even tried pde_loss_weights = 0. Do you have an idea if this could be the problem and how to solve it ?

Looking at your equation closer, I think the solution will be very sensitive to the initial condition u(0) = \delta, and your delta is already tiny. Can you see what happens when you choose a delta with a different magnitude, say delta ~ 0.5? I don’t think there is a problem in NeuralPDE, but just an extremely high sensitivity of the solutions to the initial condition.

Hi, you are right, when I tune the delta parameter to 0.5 the result approaches the analytical solution, not totally but at least it is near. So I guess it’s not a problem in NeuralPDE but in my system instead. Thanks for your help !