Troubleshooting Inverse Problem with PINNs for Wind Turbine Aerodynamics using NeuralPDE.jl

Hello all,

I’m trying to estimate 10 parameters (c1 to c10) of an ODE representing the aerodynamic model of a wind turbine, and I’ve decided to do it via Bayesian inference using physics-informed neural networks (PINN) with the package NeuralPDE.jl. The code attached below is a modification of the one in the tutorials, Parameter Estimation with PINNs for ODEs · NeuralPDE.jl, replacing the Lotka-Volterra system by the wind turbine aerodynamic model (and considering 3 exogenous inputs, Tg, θ and vr). For now, I decided to use NNODE() instead of BNNODE() to avoid having to deal with priors.

My issue is the following: Why does the PINN estimation basically kill my dynamics (see attached pictures)? It seems that, rather than adjusting the parameters to better match the data, the PINN makes minimal modifications on them and averages out the dynamics instead. Could it be that the neural network is currently defined as a map from time to my state variable (wr), and it should be a map from time AND inputs to my state variable, or am I mistaken? How could this be done?

I was also going to try another strategy based on the Universal Differential Equations example Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations · Overview of Julia's SciML (so instead of assuming a parametric model for Cp(λ,θ) and estimate parameters, I would define it as a neural network) to see if that works better, but I’m quite sure it should be possible to obtain much better results for such a simple ODE with PINNs.

Thank you in advance!

using NeuralPDE, Lux, Plots, OrdinaryDiffEq, Distributions, Random
using XLSX, Plots
using OptimizationOptimJL, LineSearches

# Define your wind turbine model with exogenous inputs
function wind_turbine(u, p, t)
    # Extract model parameters
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10 = p
    
    # Extract current state
    wr = u
    
    # Define constants
    μd = 0.05
    Rr = 120.998
    ρ = 1.225
    Ar = π * Rr^2
    Jr = 321699000
    Jg = 3.777e6
    x = 0
    
    # Calculate tip-speed ratio λ
    λ = wr * Rr / vr_interp(t)
    
    # Calculate 1/λi (numerical approx. for Cp(θ, λ))
    λi_inv = 1 / (λ + c9 * θ_interp(t)) - c10 / (θ_interp(t)^3 + 1)
    
    # Calculate Cp (numerical approx. for Cp(θ, λ))
    # Cp = max(c1 * (c2 * λi_inv - c3 * θ_interp(t) - c4) * exp(-c5 * λi_inv) + c6 * λ)
    Cp = max(0, c1 * (c2 * λi_inv - c3 * θ_interp(t) - c4 * 1/λi_inv * θ_interp(t) - c5 * θ_interp(t)^x - c6) * exp(-c7 * λi_inv) + c8 * λ)

    Tr = 1 / (2 * wr) * ρ * Ar * vr_interp(t)^3 * Cp
    
    # Evaluate differential equation for rotor speed
    dwr = ((1 - μd) * Tr - Tg_interp(t)) / (Jr + Jg)
    
    return dwr
end

# Define initial-value problem.
wr0 = 0.8246
u0 = wr0  # Initial state with exogenous inputs
p = [0.5, 116, 0.4, 0, 0, 5, 21, 0.005, 0.089, 0.035]  # Model parameters
tspan = (30.0, 229.95)

# Load data from Excel sheet
function load_data(filename)
    data = XLSX.readxlsx(filename)["Ark1"]  # Assuming your data is in Sheet1
    wr = data["A"][1:4000]  # Assuming column A contains angular speed data (rad/s)
    Tg = data["B"][1:4000]  # Assuming column B contains generator torque data (Nm)
    θ = data["F"][1:4000]  # Assuming column F contains pitch angle data (degrees)
    vr = data["D"][1:4000]  # Assuming column D contains wind speed data (m/s)
    return wr, Tg, θ, vr
end

# Load data from Excel sheet
filename = "…\\data_14ms.xlsx" 
wr_data, Tg_data, θ_data, vr_data = load_data(filename)

# ‘Any’ to ‘Float64’ (needed for interpolation)
wr_data = map(Float64, wr_data)
Tg_data = map(Float64, Tg_data)
θ_data = map(Float64, θ_data)
vr_data = map(Float64, vr_data)

t_data = range(tspan[1], tspan[2], length=4000)  # Time points

using DataInterpolations

# Create function by interpolating inputs
Tg_interp = BSplineApprox(Tg_data, t_data, 49, 50, :ArcLen, :Average)
θ_interp = BSplineApprox(θ_data, t_data, 49, 50, :ArcLen, :Average)
vr_interp = BSplineApprox(vr_data, t_data, 49, 50, :ArcLen, :Average)

prob_model = ODEProblem(wind_turbine, u0, tspan, p)

# Solve the ODE using stiffness detection and auto-switching algorithm with the standard Tsit5 and Rosenbrock23 for non-stiff and stiff ODE
# and specify saveat to save the solution at specific time points
dt = 0.05  # Sampling time
sol_model = solve(prob_model, AutoTsit5(Rosenbrock23()), saveat=dt)
t_ = sol_model.t
u_ = sol_model.u

#plot(sol_model, xlabel="Time", ylabel="Rotor Speed", label="Rotor speed (model)")
#plot!(t_data, wr_data, label="Rotor speed (data)")
#plot!(legend=:topright)

# Define NN architecture
rng = Random.default_rng()
Random.seed!(rng, 0)
n = 20
chain = Lux.Chain(
            Lux.Dense(1, n, Lux.σ),
            Lux.Dense(n, n, Lux.σ),
            Lux.Dense(n, n, Lux.σ),
            Lux.Dense(n, 1)
        )
ps, st = Lux.setup(rng, chain) |> Lux.f64

# A function additional_loss(phi, θ) where phi are the neural network trial solutions, θ are the weights of the neural network(s)
function additional_loss(phi, θ) # should it include p?
    return sum(abs2, phi(t_data, θ) .- wr_data)/size(wr_data, 1)
end

opt = LBFGS(linesearch = BackTracking())
alg_NN = NNODE(chain, opt, ps; strategy = WeightedIntervalTraining([0.7, 0.2, 0.1], 500), param_estim = true, additional_loss = additional_loss)

sol_NN = solve(prob_model, alg_NN, verbose = true, abstol = 1e-6, maxiters = 500, saveat = t_data)

# plotting solution for x,y for chain
plot(sol_NN, label = "estimated wr")

# comparing it with the original solution
plot!(sol_model, labels = "model wr")
plot!(t_data, wr_data, label = "data wr")

# Check estimated parameters
sol_NN.k.u.p

Estimated parameters (last line):

10-element view(::Vector{Float64}, 902:911) with eltype Float64:
   0.500109466140297
 115.99999759675735
   0.40097664757801016
   0.005697616093540125
  -1.2050434295049353e-5
   4.999987949565712
  21.000018552553325
   0.00011821424564469212
   0.08842105778473758
   0.03504212549242426

Last plot (comparison between model simulation, data and PINN estimation):
pic2

I’ll attach the data below for reproducibility.

Excel file with data:

For comparison purposes, I attach a figure with standard parameter estimation on my ODE using just 30 iterations of Levenberg-Marquardt least squares search in MATLAB. The differences observed in the red curve are due to a slightly different initial parameter guess in this specific test, and there is a bit less noise in the data, but that is irrelevant. This is the kind of behavior that I expected to see. If least squares achieves this, I guess parameter estimation using PINNs or BPINNs should be better or at least as good, so I guess there is something in my implementation that is not quite right.

1 Like

Your ODE solution there leads me to believe you have an error in that implementation. You plot the solution and it’s constant. If you print your derivative values are they zero?

The derivative values look like this.

I think at least one of the issues is that the PINN should also take into account the exogenous inputs in the training, but simply adjusting the input size to 4 in the chain isn’t sufficient. I’m unsure what other modifications to make in the code. Maybe the additional loss function needs to be adjusted to incorporate the inputs as well?

@SathvikBhagavan maybe take a look here?

Looking at the implementation, there is one bug that caught my eye:

In additional loss function, phi(t_data, θ) gives a matrix instead of a vector. Doing something like:

wr_data2 = reshape(wr_data, 1, :)
function additional_loss(phi, θ)
    loss =  sum(abs2, phi(t_data, θ) .- wr_data2)/size(wr_data2, 2)
    return loss
end

Looking at the data, it looks very noisy, use an interpolation of it in the additional loss?

Also the time span is very large. PINNs are generally not great with large time spans. So I recommend to try it out on smaller time spans first.

If your goal is to just estimate parameters, may be doing through normal parameter estimation routines (without PINNs) would be better.

Hi @SathvikBhagavan and @ChrisRackauckas. Thanks a lot for getting back to me. I made the suggested changes and here are the new estimation results. It makes more sense now, even though the PINN still smooths out my dynamics, making the result worse than the least squares one.

There is still something that I believe is incorrect in this implementation. In this system, the neural network should also map the corresponding inputs at each time step to the predicted state variable, and right now it does not include the inputs in the first layer. But including them is not that easy (I think), because apart from changing the loss function, there would be a mismatch with the input size of the ODEProblem. I have not found any example in the NeuralPDE.jl documentation that deals with this.

Regardless, I guess I misunderstood how PINNs behave when it comes to solving inverse problems. I initially expected that the PINN would estimate parameters to the best of its ability, similar to standard Bayesian inference, and then it would compensate for any remaining mismatch due to unmodeled physics using the neural network’s action, but it is not the case. I have two questions to wrap up this discussion (apart from the implementation issue mentioned above):

  1. Is there a strategy that precisely follows what I described earlier? That is, estimating parameters as accurately as possible and compensating for missing physics at the same time? Would this be the Universal Differential Equations (UDE) approach as shown in Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations · Overview of Julia's SciML or is there another approach?

  2. I understand that PINNs are typically used to efficiently solve the forward problem of complex systems, but I don’t get the point of PINNs when solving inverse problems (assuming that the current implementation is correct, which I doubt). If you know the structure of the differential equations but lack the parameters, Bayesian inference can be applied. If some structure is known but there are unmodeled dynamics, the UDE or SINDy approach can be used. If you’re entirely uncertain about the form of your differential equation, NeuralODEs or their variants can be employed. So I can’t really see the point of PINNs solving inverse problems, if it’s going to average out your dynamics. What’s the benefit I’m not seeing?

Thank you in advance.

In this system, the neural network should also map the corresponding inputs at each time step to the predicted state variable, and right now it does not include the inputs in the first layer. But including them is not that easy (I think), because apart from changing the loss function, there would be a mismatch with the input size of the ODEProblem.

I am not sure what you mean. You have a defined input function. Why would it be different by say sin or any other function?

Is there a strategy that precisely follows what I described earlier? That is, estimating parameters as accurately as possible and compensating for missing physics at the same time?

Yeah, you can use UDEs.

So, if the problem has some missing physics, I don’t think PINNs would work at all. It assumes you have the correct differential equations. Solving inverse problems with PINNs is basically optimizing the weights of the neural network and the parameters together. You can also do it step by step - do the standard optimization for parameters first and then learn a PINN.

PINNs have known problems with high frequency features - https://maths4dl.ac.uk/wp-content/uploads/2023/07/Moseley.pdf is a good resource.

If your end goal is to learn missing physics, UDEs would be better suited. PINNs are more like surrogates for your original complete model.

Thank you @SathvikBhagavan, that makes more sense to me now.

Regarding my implementation doubt, I mean the following:

Based on Stiasny’s PhD thesis “Physics-Informed Neural Networks for Power System Dynamics”, 2023, solving the inverse problem with PINNs (or BPINNs if you include priors) looks like this:

The upper half of the figure represents the trajectory estimation, where the neural network h_θ (θ refers to the neural network parameters) estimates x_hat for the input values t of the dataset D. Then, the observation function g is applied (if you don’t measure your states directly) and you define your model likelihood. Then you can estimate parameters based on this likelihood and potentially the priors. The lower half represents the ODE estimation, where you evaluate the trajectory estimate on collocation points D_c, you obtain x_hat_c and d(x_hat_c)/dt with Automatic Differentiation (AD). Then you compute the difference between your model f_λ applied on the predictions (λ refers to the model parameters) and your derivatives computed with AD and you form another likelihood function, which is used to estimate λ but also affects θ.

I thought that, in this case, the neural network should receive not only t, but also u (which does not appear in the diagram) representing my inputs. The dataset D would also include u, and the model would be evaluated as f_λ(x_hat, u).

I don’t think this is how it’s implemented right now. In another paper from Legaard et al., “Constructing Neural Network-Based Models for Simulating Dynamical Systems”, 2022, they classify neural networks between direct-solution and time-stepper models as seen below:

Then they say that the PINNs are direct-solution models, and NeuralODEs, Hamiltonian neural networks, etc. are time-stepper models. Apparently, it is relatively simple to include inputs in time-stepper models (as seen in the diagram below), but I don’t think they mention how to do it for direct-solution methods, so perhaps they do not lend themselves well to introducing inputs.

You’re mixing up a few different factors here. Yes, if you were to have an open loop control u(t) input into the function, you would have to account for that. You can either include it in the normal way in the parameters, include it in the inputs to the neural network, or use a physics-informed neural operator. The latter should be merging rather soon, see:

We should make sure to cover that in the interface.

Thank you @ChrisRackauckas, I look forward to trying that method. Are there any examples that show how to include these inputs u(t) in the normal way, that is, in the parameters or in the inputs of the neural network? I’m not totally sure what exactly should I change in my code in those two cases.

Apart from that, I’ve been trying to implement this as a UDE problem and I’m having some complications. I will open a new topic to discuss this.