Solving ODE parameters using experimental data with control inputs

Maybe defining the control as a function of time and using it in the definition of the system works?

dudt(u, p, t) = p[1]*u + p[2]*D(t) + p[3]
1 Like

There are lots of examples of parameter estimation using DiffEqFlux.jl. It has a lot of similarity to DiffEqParamEstim.jl, except that it treats the problem as any chain of differential equations and neural networks. (You can leave out the neural networks, and it makes no difference.) I mention it because there’s more documentation and more examples, including exogenous input signal.

I see. Indeed, for the purposes of mere simulation, a time-varying control input is included in the ODEProblem definition as a parameter and there is no problem that this parameter is actually a function of time as this example shows. But now for the purpose of estimation of unknown parameters, the control input must be included in the ODEProblem in a different way so as not to be optimized over.

Sort of, but I believe your ODEProblem should always be the same, and it’s the optimization that will be defined in a different way.

In my mind, the ODEProblem should always have all the parameters to describe the system, and that can include known and unknown parameters and such, and exogenous input functions. It’s a kitchen sink of everything other than the ODE states and hard-coded coefficients. On the other hand, there is an optimization problem defined by a loss function. It has its own “parameters” which are the unknown parameters of the ODE, and nothing else. And the ODE’s p will contain those optimization parameters.

It’s a bit confusing because people say “parameters” loosely, and the context gets lost whether you’re talking about ODE or optimization. (I blame ML for this. We used to talk about optimization or decision variables, and “parameters” referred to things known and fixed unless in parameter sensitivity analysis. Then in ML it seems the vocabulary has changed to optimization parameters, perhaps because the models didn’t include physics or other parameterized stuff until more recently.)

1 Like

Ehh, our issue is that “parameters” is really “user data”. It can be cache variables, things to optimize, physical constants, neural network weights, etc. We really need to take a pass through the docs and retune the terminology around it at some point.

Actually… optimization generally has a loss f(u,p) which returns a scalar. The states of the optimization problem are the parameters of the ODE. The parameters of the optimization can be the hyperparameters of the optimization itself, which can then be tuned by differentiating the optimization (though this adjoint isn’t implemented yet). That part is admittedly very very confusing.

Awesome, I didn’t realize I could do that. It does make the generation of tgrad and Jacobian fail, but that doesn’t seem to affect the parameter optimization. Separate question, but I wonder what implication that has for how I can use those model…

For the interested, I constructed the model as such

sma_ode! = @ode_def SMATemperatureModel begin
    dT = a*T + b*Df(t) + c
end a b

where Df(t) and c are externally defined.

That library was helpful, I think a lot of the usage examples extend over to DiffEqParamEstim as well, so I was able to use that. Thanks!

That’s just speed optimizations.

I think it would be useful to work through the terminology in documentation. Part of the problem is that different scientific disciplines use concepts differently. Two relevant examples:

  1. What is a parameter? I was always told (in university) that
  • “mathematicians” tend to use the word parameter in the meaning of a scalar constant (or a collection of such). This is how the word is used in, e.g., Modelica [although Modelica uses the concept of constant to denote “natural constants”, and parameter to denote problem dependent constants]. On the other hand,
  • “physicists” use the word in a wider meaning, e.g., as in “distributed parameter system” – where “parameter” is also used to denote independent variables, etc.
  • I’m not used to the very general interpretation of “all user provided information”.
    I have no particular competence to say which use of the word “parameter” is best/most useful. But it is important to realize that there are different uses of this word among users of Julia/MTK, and a good documentation should include some discussion of how the word is understood, e.g., in ModelingToolkit.
  1. What is a state?
  • For finite dimensional dynamic systems (with time as independent variable), I’m used to state denoting “a minimal set of variables which at initial time uniquely defines the solution”. In some work (e.g., R. Kalman), the state is therefore seen as representing the history of a system.
    For ordinary differential equations, this (normally) implies reducing the equations to a set of first order differential equations of form \frac{dx}{dt} = f(x;t); then x (or any unique transformation of it) is a state.
    [This use is implied in literature/songs, e.g., “That’s the way it all should happen // When you’re in the state your in” from St. Dominic’s Preview.].
    For DAE systems, things are more complex. With \frac{dx}{dt}=f(x,z;t) and 0=g(x,z;t) – what is the state? This depends on the index of the DAE. If one denotes the DAE variables x by differential variables (and the z variables by algebraic variables), the actual number of states may be equal to the number of x:es, it may be lower than the number of x:es (if one of the x:es has been introduced by differentiating an algebraic constraint), or it may be larger than the number of x:es – e.g., for higher index DAEs.
  • In Thermodynamics, one uses the word state in a different (but related?) way, e.g., in Equation of State. Here, state is perhaps similar in that it refers to a minimal set of information. But there is no dynamics involved.

To me, it sounds a little confusing with sentences like “The states of the optimization problem are the parameters of the ODE”. Here, it seems like “state” is used in the meaning of “unknown”. Unknowns in optimization problems do not always have unique solutions, while the common use of “state” (at least in control theory) implies some sort of minimal/unique information.

1 Like

Note that I haven’t announced it anywhere yet, but I’ve started working on a documentation which documents the full flow and composability of the SciML ecosystem at Home · SciMLBase.jl . It’s still very sparse right now, but I think these are some of the questions we should really answer there.

2 Likes

Just a minor argument: I do not think that it is commonly required that the set of variables must be minimal in order to qualify as a (characterization of a) state (of a finite-dimensional dynamic system). You can easily have state equations that can (but need not) be reduced to a minimal realization.

Well, sort of. But is is perhaps worth adding that:

The distinction between the two lies in the fact that a parameter value can be changed from one simulation to the next whereas the value of a constant cannot be changed once the model is compiled.

Minimal realization is not the same as minimal required information for finding a unique solution to dynamic systems.

“Minimal realization” as a concept relates to systems, i.e., dynamic systems with inputs and outputs. The point is that one can then eliminate states that simultaneously (i) are not controllable, and (ii) are not observable – this reduced system constitutes the minimal realization.

If you simply have a dynamic systems (differential equations) and do not care about inputs and outputs, then the state is normally considered the minimal information to ensure a unique solution to the dynamic system.

In other words: for system \frac{dx}{dt}=f(x;t), the state is the minimal information to ensure a unique solution. For this case, it is not possible to talk about minimal realization.

However, for the system given by dynamics \frac{dx}{dt} = f(x,u;t) and output y=g(x,u;t), one can talk about a minimal realization from input u to output y, which is defined such that the state x must be controllable from u and observable from y.

1 Like

Good point – that is the practical difference. This implies that constant is used for quantities such as \pi, Planck’s constant, etc., and sometimes gravity (unless one works for NASA, etc…), while parameter is used for constants that describe geometry, physical parameters for possibly varying substances, etc.

1 Like

True. As a control engineer I somehow automatically considered dynamic(al) systems with inputs because those are of interest for me :slight_smile: And for such systems, even if the set of state variables is not minimal, we still call them state variables or state vector.

1 Like

Being a control engineer myself, too, it is also interesting to consider systems which are not controllable or observable, and ponder on what one can do to make the system controllable or observable :-o

2 Likes

I was just distinguishing between things to be optimized, and things to be excluded, e.g. ODE parameters might include D(t) and known constants. Of course, you can hard code as suggested by @IlyaOrson, but often it’s helpful to treat it as a parameter to make it easy to set up different problems or inputs.

Right now you can pack everything into the parameters p and handle explicitly afterwards which part of them are to be optimized and which are not. If you have strong feelings about the interface maybe you would be interested in this issue. I am not sure about another design, initial conditions could become the optimization variables for certain problems, for example.

Let me summarize the original problem of this thread because I am not quite sure if it has been resolved completely:

If the function f(u,p,t) in the ODEProblem uses besides some “tunable” (or “to-be-determined”) parameters (assembled into p) also some parameters that are not to be tuned/estimated such as the (mostly time-varying) control inputs, it is unclear how to estimate the former set of parameters following the procedure described in the manual for DiffEqParamEstim.jl. It was unclear to the author of the original post – and I think it has not yet been cleared out – how to tell the optimizer (to be called) over which parameters the optimization should be performed (and which parameters should be left intact).

It appears that the author @Anthony of the original post is happy with the advice to solve the problem by defining the control signal as a function of time. With this solution he soon runs into troubles with pulsed control signals, as he describes in his next post. Obviously the significant discontinuiuty (of the sharp edges) is the culprit. It seems to be possible to mittigate the problems by specifying the times of the switchnig through the tstops property (or perhaps even defining a callback that “does nothing”, if I understand it correctly). The problem seems to be answered/resolved.

Nonetheless, there may still be good reasons for defining the control input through parameters (possibly in combination with callback if the control evolves in time) – a feedback control is a prominent example. Or this is at least my understanding after following the discussion at DiffEqs: Hybrid (continuous+discrete) system / periodic callback - #38 by ChrisRackauckas. Is it possible to estimate just a subset of the parameters then?

The original post did not contain an example, that is why I am providing one. Perhaps it may be easier to continue a discussion then:

We consider a system modelled by \dot x(t) = a x(t) + b u(t) , where x is the state variable, u is the (known) control input. a and b are the parameters to be determined. (Note that I simplified the original problem by removing the offset parameter and considering just a perfectly linear system, for which alternative ways to solve the problem are available).

When generating the input-output data, I have also included some noise(s) in the model. First, a noise \zeta that affects the dynamics (aka random disturbance or process noise). And then some measurement noise \nu that adds to the measured state. Furthermore, the (noisy) state measurements are only available at some equidistant time instants with period T_s. Here is a full model used for generating the input-output data:

\begin{align} \dot x(t) &= a x(t) + b u(t) + \zeta(t),\quad x(0)\; \mathrm{given}\\ y(kT_s) &= x(kT_s) + \nu(kT_s) \end{align}

The actual control signal u will be designed in advance. It will consist of two (rectangular) pulses – first it goes from zero to one, then from one to minus one and then, finally, to zero again.

Below is a code for generating the data (advices on how to do this or that better are welcome but here it really just servers the purpose of creating the “experimental” data):

using DifferentialEquations
using Plots

# Setting up the continuous-time system
f(x,p,t) = p[1]*x + p[2]*p[3]       # The standard RHS for an ODE, also the drift term for an SDE
g(x,u,p) = 0.1                      # The diffusion term for an SDE

x₀ = 0.0                            # The initial state
tspan = (0.0,10.0)                  # Time interval for the solution

a = -10.0                           # Nominal parameters of the model
b = 10.0                            # ---//---
u = 0.0                             # The control is initially zero
p = [a,b,u]

prob = SDEProblem(f,g,x₀,tspan,p)   # Formulating the problem as a stochastic ODE to include noisee

# Accounding for the external (control) input through callbacks
tₛ = [1.0,3.0,5.0]                  # Times when the control switches

condition1(u,t,integrator) = t==1.0 # The control switches to +1
affect1!(integrator) = integrator.p[3] = 1.0
cb1 = DiscreteCallback(condition1,affect1!)

condition2(u,t,integrator) = t==3.0 # The control switches to -1
affect2!(integrator) = integrator.p[3] = -1.0
cb2 = DiscreteCallback(condition2,affect2!)

condition3(u,t,integrator) = t==5.0 # The control switches back to zero again
affect3!(integrator) = integrator.p[3] = 0.0
cb3 = DiscreteCallback(condition3,affect3!)

cb = CallbackSet(cb1,cb2,cb3)

# Solving the problem
sol  = solve(prob,SRIW1(),callback=cb,tstops=tₛ)

# Extracting the states at regular periods to pretend we measure them periodically
Tₛ = 0.2                            # Sampling period
tₘ = range(tspan[1],tspan[2],step=Tₛ)
solₘ = sol(tₘ)

# Adding a "measurement" noise to sampled values
yₘ = solₘ.u + 0.1*randn(Float64,length(solₘ))

# (Preparing for) plotting the simulated states
p₁ = plot(sol, lw=2, label="Simulated in continuous time", xaxis = "t", yaxis = "x(t)")
scatter!(tₘ,yₘ,label="Noisy measurements in discrete time")

# Creating the sampled sequence of controls since these were not stored during simulation
function doublet(t)
    if t < tₛ[1]
        u = 0.0
    elseif t < tₛ[2]
        u = 1.0
    elseif t < tₛ[3]
        u = -1.0
    else
        u = 0.0
    end
    return u
end

uₘ = doublet.(tₘ)                           # Sampled controls

# (Preparing for) plotting the controls
tₚ = range(tspan[1],tspan[2],step=0.001)    # Dense time for plotting the continuous-time controls
p₂ = plot(tₚ,doublet.(tₚ),xaxis="t",yaxis="u(t)", label="For simulation in continuous-time")
scatter!(tₘ,uₘ,label="Sampled in discrete-time")

# Plotting both the states and controls
plot(p₁,p₂,layout=(2,1))

noisy_input_output_data

Now, the input-output data set is ready: the vectors uₘ and yₘ (obtained with the sampling period Tₛ, possibly the vector tₘ of times when measurements were taken is available too). The parameters to be determined are a and b, and these will appear in the ODEProblem as p[1] and p[2] parameters.

prob4opt = ODEProblem(f,x₀,tspan,p)

But the optimization is certainly not to performed over p[3]. But how to include this information into the definition of the cost function?

cost_function = build_loss_objective(prob4opt,Tsit5(),L2Loss(tₘ,yₘ), 
maxiters=10000,verbose=false)

Is it actually possible to make cost_function just a function of p[1] and p[2]? I was thinking about introducing equality constraints on p[3] in the optimization but that is nontrivial because the parameter (the control signal) evolves in time, it is not fixed.

For linear systems like this one, there is a lot of well developed methods to estimate the system parameters, as well as covariance properties and noise models etc. that are useful in a practical control scenario. For a temperature system like this, classical system identification could look something like

using ControlSystems, ControlSystemIdentification
plotly(show = false)
w = 2pi .* exp10.(LinRange(-3, log10(0.5), 500)) # frequency vector
G0 = tf(1, [10, 1]) # The true system, 10ẋ = -x + u
G = c2d(G0, 1) # discretize with a sample time of 1s
println("True system")
display(ss(G0))

# Data generation
u = sign.(sin.((0:0.01:10) .^ 2)) # sample a control input for identification
y, t, x = lsim(ss(G), u) # Simulate the true system to get test data
yn = y .+ 0.2 .* randn.() # add measurement noise
data = iddata(yn, u, t[2] - t[1]) # create a data object

# Estimation
na, nb = 1, 1 # number of parameters in denominator and numerator
Gh = arx(data, na, nb, estimator = wtls_estimator(data.y, na, nb)) # estimate an arx model
Gh2, noise_model = plr(data, na, nb, 1) # try another identification method

# Plot results
println("Estimated system")
display(ss(d2c(Gh))) # Convert from discrete to continuous time

bp = bodeplot(G, w, lab = "G (true)", hz = true, l = 5)
bodeplot!(Gh, w, lab = "arx", hz = true)
bodeplot!(Gh2, w, lab = "plr", hz = true, ticks = :default)

sp = stepplot(G, 150, lab="G (true)")
stepplot!(Gh, 150, lab = "arx")
stepplot!(Gh2, 150, lab = "plr", ticks = :default)
hline!([1], primary = false, l = (:black, :dash))

lp = lsimplot(ss(G), u, lab="G (true)")
lsimplot!(ss(Gh), u, lab = "arx")
lsimplot!(ss(Gh2), u, lab = "plr", ticks = :default)
plot!(data.t, yn[:], lab = "Estimation data")

plot(bp, sp, lp, layout = @layout([[a b]; c]))

With printout

True system
StateSpace{Continuous, Float64}
A = 
 -0.1
B = 
 1.0
C = 
 0.1
D = 
 0.0

Continuous-time state-space model

Estimated system
StateSpace{Continuous, Float64}
A = 
 -0.10260326778775565
B = 
 1.0
C = 
 0.10056519755546571
D = 
 0.0

Continuous-time state-space model

For practical identification, additional centering of the data is required.

2 Likes