Loss function in NeuralPDE.jl

I’m trying to train a PINN using NeuralPDE.jl. I was able to start the training and achieved (at least in my opinion) a quiet small loss < 1e-7. But when I plotted the network solution against the analytic solution the two results looked very different.

That’s why I wanted to understand the loss function. Sadly I was not able to find anything about how it is definded, except that it is a combination of the boundary condition loss and the loss for the pde. So I would be very happy if someone here could help me understand how the loss is defined.

For you to better understand the problem, I have prepared my minimal working example:

using Lux, NeuralPDE, OptimizationOptimisers, Plots, Random
import ModelingToolkit: Interval

@parameters t
@variables u(..)

Dt = Differential(t)
eq = Dt(u(t)) ~ (u(t))^2 - (u(t))^3

δ = 0.01
bcs = [u(0.0) ~ δ]
domains = [t ∈ Interval(0.0, 2 / δ)]

chain = Lux.Chain(Dense(1, 8, Lux.σ), Dense(8, 8, Lux.σ), Dense(8, 1))

initParams, initState = Lux.setup(Random.default_rng(), chain)
optimiser = OptimizationOptimisers.Adam(0.01)

discretization = PhysicsInformedNN(chain, QuasiRandomTraining(1000))
@named pde_system = PDESystem(eq, bcs, domains, [t], [u(t)])
prob = discretize(pde_system, discretization)

callback = function (params, loss_val)
    ps, st = Lux.setup(Random.default_rng(), chain)
    y, st = Lux.apply(chain, [0.5], params, st)
    println("loss = ", loss_val)

    # stop optimization when loss value is smaller than 1e-7
    return loss_val < 1e-7

params_trained = solve(prob, optimiser, callback=callback, maxiters=2000, save_best=true)
state_trained = discretization.phi

analytic_sol_func(t) = 1 / (lambertw((1 / δ - 1) * exp(1 / δ - 1 - t)) + 1)

# plot result
dx = 0.001
xs = [infimum(d.domain):(dx/10):supremum(d.domain) for d in domains][1]
u_real = analytic_sol_func.(xs)
state_with_params(x) = state_trained(x, params_trained.u)
u_predict = first.(state_with_params.(xs))

x_plot = collect(xs)
plot(x_plot, u_real, label="real")
plot!(x_plot, u_predict, label="predict")

And here is an image of the final plot:

Eingefügtes Bild

The blue line in the image is the analytical solution and the red/orange line represents the output of the network. And as you can see those two differ quiet much.

Hi @Vivi,

if you also compute symbolic_discretize(pde_system, discretization), you can observe the created loss functions in the returned symbolic problem structure.

Thank you very much @drsk.

I took a look in the output of the symbolic problem structure. As I’m quite new I don’t really understand how I can call this loss function to gain better understanding i.e. with the debugger. My idea was to look if the loss is nearly 0 when I put in the analytic solution.

For example I looked at the output of symbolic_discretize(pde_system, discretization).loss_functions.pde_loss_functions which is

NeuralPDE.var"#92#95"{NeuralPDE.var"#219#220"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:cord, Symbol("##θ#293"), :phi, :derivative, :integral, :u, :p), NeuralPDE.var"#_RGF_ModTag", NeuralPDE.var"#_RGF_ModTag", (0x4842fd3a, 0xfdea5419, 0x88b3ff40, 0x144fdfca, 0xc206fec8), Expr}, NeuralPDE.var"#12#13", NeuralPDE.var"#279#286"{NeuralPDE.var"#279#280#287"{typeof(NeuralPDE.numeric_derivative)}, Dict{Symbol, Int64}, Dict{Symbol, Int64}, QuasiRandomTraining}, typeof(NeuralPDE.numeric_derivative), NeuralPDE.Phi{Chain{NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{Dense{true, typeof(sigmoid_fast), typeof(glorot_uniform), typeof(zeros32)}, Dense{true, typeof(sigmoid_fast), typeof(glorot_uniform), typeof(zeros32)}, Dense{true, typeof(identity), typeof(glorot_uniform), typeof(zeros32)}}}, Nothing}, NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}, NamedTuple{(), Tuple{}}}}}, Nothing}, Tuple{Vector{Float64}, Vector{Float64}}, Int64, Qua…

I assume the first parameters like cord, phi,… are the input arguments to this loss function, but what do they represent?

Or asked the other way around, as I want to learn this simple ODE function, does anyone has tips why my results are so bad. Is the loss function really the problem or am I missing something else?

What does the output of the solver say? You should see the actual loss printed for every iteration. Your solver stops after a maximum of 2000 iterations, hence it could be that you haven’t converged yet. Also note that the loss function depends on the actual training strategy used.

Hi, sorry for the late response, I didn’t have time for this project. But now I’m back. I do not totally understand what you mean by solver output. I print in the callback the loss value for every iteration and this is quite small. If you mean something else, I would really appreciate if you could explain it a bit more.

I also did a training with more iterations in my case 12 000 and the loss is 5e-9 (I changed the return in the callback to return false). Do you suggest to do even more iterations for convergence? And if yes do you have a suggestion on how many I need approximately?

For the training strategy, I think I use QuasiRandomTraining.

I meant indeed the output of your loss function during training. Did you try different strategies? The loss function of QuasiRandomTraining is mean(abs2, loss_function(sets__, θ)), where the inner loss function is computed from your equation. But this depends also on how you sample. In your example, the loss seems zero almost everywhere outside [70, 110] and concentrated on a fairly small interval. I’d try a QuadratureTraining and see if you get a better result.