Moving Horizon Estimation

Hello everybody,

I am having some issues with the Moving Horizon Estimation (MHE) code I am writing. To me, the task and the code seem quite easy, but I cannot obtain the desired results for some reason.

I attach here the code that you can get also from my GitHub repo.

using MAT
using DataFrames
using Dates
using OrdinaryDiffEq, LinearAlgebra
using Optimization, OptimizationPolyalgorithms, OptimizationOptimJL
using SciMLSensitivity, Zygote, ForwardDiff
using Plots, LaTeXStrings
using Statistics, StatsBase 
using OptimizationMOI, Ipopt

Npop = 59240329  # Total Population of Italy
N = 399          # Total data length
N_mhe = 21       # Estimation horizon (3 weeks)
Ts = 1.          # Time steps for integration

# Create the "Date" vector 
start_date = Date(2020, 8, 31)
end_date = Date(2021, 10, 3)
date = collect(start_date:end_date)

SIDTTHEfile = matopen("SIDTTHE_data_DEF.mat")
SIDTTHE_data = read(SIDTTHEfile, "SIDTTHE_data")
close(SIDTTHEfile)

I_data = SIDTTHE_data[1, 1]["data"] / Npop
D_data = SIDTTHE_data[2, 1]["data"] / Npop
T_data = (SIDTTHE_data[3, 1]["data"] + SIDTTHE_data[4, 1]["data"]) / Npop
H_data= SIDTTHE_data[5, 1]["data"] / Npop
E_data = SIDTTHE_data[6, 1]["data"] / Npop
S_data = ones(1,399) - (I_data + D_data + T_data + H_data + E_data)

# Gather all the data in a single data matrix
measure_mat = vcat(S_data, I_data, D_data, T_data, H_data, E_data)

function SIDTHEfun!(du, u, p, t)
    s, i, d, t, h, e = u 
    α, δ, γ, σ, τ = p
    λ = 0.1     # coefficient Lambda (fixed)
    du[1] = -s * i * α                         # dS/dt
    du[2] = s * i * α - (γ + λ) * i            # dI/dt
    du[3] = i * γ - d * (λ + δ)                # dD/dt
    du[4] = δ * d - (σ + τ) * t                # dT/dt
    du[5] = (i + d) * λ + t * σ                # dH/dt
    du[6] = t * τ                              # dE/dt
end

u0 = [S_data[1]; I_data[1]; D_data[1]; T_data[1]; H_data[1]; E_data[1]]
α0 = 0.25 
γ0 = 0.12
δ0 = 0.01
σ0 = 0.02
τ0 = 0.02
p0 = [α0, γ0, δ0, σ0, τ0]

# First simulation with initial condition parameters
tspan = (1.0, N_mhe)
tsteps = collect(range(1, stop = N_mhe, length = N_mhe))
prob = ODEProblem(SIDTHEfun!, u0, tspan, p0)

function simulate(p)
    newprob = remake(prob, u0 = u0, p = p)
    solsim = solve(newprob, Tsit5(), saveat = tsteps)
    return reduce(hcat,solsim.u)
end

function loss(p)
    sol = simulate(p)
    loss = sum(abs2, ((sol.- ymeas) ./ maximum.(eachrow(ymeas)) )) #= + sum(abs2, p - p_tilde) sum(abs2,  sol[:,1].- x_tilde) =#
end

p_lb = [0.05, 0.005, 1e-4, 1e-4, 1e-4] # lower bounds on parameters
p_ub = [0.8, 0.6, 0.6, 0.5, 0.5] # upper bound on parametersu0

states_dyn = [ S_data[N_mhe]; I_data[N_mhe]; D_data[N_mhe]; T_data[N_mhe]; H_data[N_mhe]; E_data[N_mhe] ]
params_dyn = [ α0; γ0; δ0; σ0; τ0 ]

for k in N_mhe:N-N_mhe
    global u0
    global ymeas = measure_mat[:, k-N_mhe+1:k] #available measurement for each time window

    if k == N_mhe
        # u0 = u0 in the first iteration (for simulate(p) function)
        global x_tilde = u0
        global p_tilde = p0
    else 
        u0 = reshape(opti_odesol[:,2],6,1) # initial condition for the new simulate(p) function
        global x_tilde = reshape(opti_odesol[:,2],6,1)
        global p_tilde = reshape(optires.u, 5, 1) 
    end

    adtype = Optimization.AutoForwardDiff()
    optf = Optimization.OptimizationFunction((x, p) -> loss(x), adtype)
    optprob = Optimization.OptimizationProblem(optf, p0, lb=p_lb, ub=p_ub) # Initial guess for parameter estimation
    optires = Optimization.solve(optprob, Ipopt.Optimizer(), maxiters = 5000) # Ipopot Optimizer
    opti_odesol = solve(remake(prob, u0 = u0, p = optires.u), Tsit5(), saveat = tsteps)
    opti_odesol = reduce(hcat,opti_odesol.u)

    # Save of the optimization values
    params_dyn = hcat( params_dyn, reshape(optires.u, 5, 1) )
    states_dyn = hcat( states_dyn, reshape(opti_odesol[:,end],6,1) )

    # update on the initial conditions
    p0 = reshape(optires.u, 5, 1)
end

# --- Results Plot ---
plot_font = "Computer Modern"
default(fontfamily=plot_font,
        linewidth=2, framestyle=:box, label=nothing, grid=true) 
scalefontsizes(1)

plotS = plot(date[N_mhe:N-N_mhe], measure_mat[1,N_mhe:N-N_mhe],label="Real Data")
plotS = plot!(date[N_mhe:N-N_mhe], states_dyn[1,2:end],label="Sim Opti Params")
plotS = plot!(ylabel="Population", title="S - Susceptible population")

plotI = plot(date[N_mhe:N-N_mhe], measure_mat[2,N_mhe:N-N_mhe],label="Real Data")
plotI = plot!(date[N_mhe:N-N_mhe], states_dyn[2,2:end],label="Sim Opti Params")
plotI = plot!(ylabel="Population", title="I - Infected population")

plotD = plot(date[N_mhe:N-N_mhe], measure_mat[3,N_mhe:N-N_mhe],label="Real Data")
plotD = plot!(date[N_mhe:N-N_mhe], states_dyn[3,2:end],label="Sim Opti Params")
plotD = plot!(ylabel="Population", title="D - Detected population")

plotT = plot(date[N_mhe:N-N_mhe], measure_mat[4,N_mhe:N-N_mhe],label="Real Data")
plotT = plot!(date[N_mhe:N-N_mhe], states_dyn[4,2:end],label="Sim Opti Params")
plotT = plot!(ylabel="Population", title="T - Threatened population")

plot_α = plot(date[N_mhe:N-N_mhe], params_dyn[1,2:end],label="Real Data")
plot_α = plot!(ylabel="Population", title="α parameter")

My goal here is to minimise the LSE between data and model estimation, obtaining a set of parameters p for every window.
The algorithm I am running here for the MHE works like this:

  1. Initial parameter p0 and states u0 guess is made

  2. The first LSE minimisation problem is solved for the past 21-day time window

  3. The results from the LSE are collected (in this case the set of parameters obtained) and the ODE system is re-simulated in order to obtain state trajectory results.

  4. The second step (t+1) minimisation initial conditions are set as follows: The initial condition for the states u0 is set as the 2nd step of the previously simulated trajectory, since the widows are shifting one step into the future, so the second step of the window @ t = t_k must be close to the initial conditions of the window @ t = t_k+1.
    The initial parameter guess is set equal to the previously obtained parameter.

  5. Steps 3-4-5 are repeated shifting the window one day at a time and the FINAL VALUES of the state trajectory in each time window are collected and appended to the states_dyn vector at every time step.

The code runs, but it seems that the minimization is working only on one set of data (in this case, all data curves are way off the resulting prediction, but the T_data one), while the others are totally disregarded. I thought this was a normalisation error, but the data fitting process did not work even after normalising the data in the cost function (see sum(abs2, ((sol.- ymeas) ./ maximum.(eachrow(ymeas)) )) )

Moreover, I would like to add some more terms to the cited cost function, such as that initial conditions between consecutive windows must be close to one another and that parameter estimation results from consecutive windows should not vary too much (to avoid overfitting).

I am open to every tip/help on this topic, I am pretty new to the Julia Language and could have made a ton of errors, so if you have any critiques on how the code is structured, please let me know!!

Thanks for the help

Don’t know if my comments will be helpful, but let me try to lay out the MHE for systems in state space form (extension to DAEs should be straightforward).

So you have a model in the form

x_{t+1} = F(x_t, u_t, w_t)

where x_t is the state at time t, u_t is a known input at time t, and w_t is an unknown input at time t.

You observe the state x_t through an observation y_t described as

y_t = G(x_t, v_t)

where v_t is some observation error/noise. Normally, the output is affine in v_t, typically, y_t = g(x_t) + v_t.

Next, you have a collection of measurements:

\mathcal{Y}_{1:T} = [y_1, \ldots, y_T]

and you have an apriori estimate of x_1, denoted by \hat{x}_{1|0}. Here, syntax \hat{x}_{m|n} means the best possible estimate of x_m with available measurements \mathcal{Y}_{1:n}.

With this set-up, you can define a cost function V:

V = \frac{1}{2}||x_1-\hat{x}_{1,0}||^2_{X_1} + \sum_{t=1}^T(||v_t||^2_{V_j} + ||w_t||^2_{W_j})

and minimize V with the model x_{t+1}=F(x_t,u_t,w_t) and output y_t=g(x_t)+v_t, t\in \{1:T\} + data \mathcal{Y}_{1:T} as constraints (you need to fine think on the sum boundaries, etc.).

If you have unknown parameters, just add them to the x.

Typical values of matrices X, V, and W are the inverse of covariance matrices of the various variables.


When you have found the solution (\hat{x}_{t|T}, \hat{v}_{t|T}, \hat{w}_{t|T}) - where the state is of main interest, if t<T, this is known as a smoothed estimate, if t=T you have the filtered estimate, while if you try to compute t>T (e.g., by assuming values for u_t and w_t for t>T), you have a forecast or a predicted estimate.


So far, so good. The next step is to make your estimator into a Moving Horizon Estimator.

In principle, you just increase the value of T by one, and add one more output in \mathcal{Y}_{1:T}, where you use the improved estimates of initial state and initial parameters (if any) in the cost function. For linear systems and with Gaussian noise + above indicated choice of weight matrices, this will make \hat{x}_{T:T} the Kalman filter estimate at time T.

In practice, you can not operate with a steadily growing \mathcal{Y}_{1:T}, so the normal solution is to keep the size of \mathcal{Y}_{1:T} constant by tossing out the oldest observation y_1 every time you add a new one. This is what is known as Moving Horizon Estimator. Then, as you indicate, you will use the estimate \hat{x}_{2|T} \rightarrow \hat{x}_{1|0}, more or less.


Is there a difference in what I have suggested, and what you have done?

My impression is that you have assumed a perfect model in that you have neglected both the possibility of (i) a disturbance w_t, and (ii) a measurement error v_t.

In other words, you seem to only use an error in model parameters and initial states.

I think you need to include the possibility of unknown disturbances (as a minimum), and probably also unknown observation errors.


OK – don’t know if this was of any help.

I think there is a package that already implements MHE. Maybe you can check that one out?

Thanks for the time spent answering Brent.
I agree with you that I should add both disturbances and measurement errors (generally additive gaussian white noise) in the dynamics, and try to minimise them.
In the code here I am trying to add one term at a time into the cost function, trying to make it more complex step by step, that’s why the noise is not there yet.

I have already done this algorithm in MATLAB using CasADi, following the theoretical guidelines provided in the do-mpc MHE documentation and it works, so I do not understand why the “translation” in Julia is giving me this issues.
I think is much more of a coding problem rather than a theoretical one, but I appreciate your help.

I will look at the MHE package, thanks again for the tips!

Have a look at the docs

So you got good results with your MATLAB implementation, with the same formulation? Then it should be a coding issue. Take a look at Baggepinnen’s package and compare your results.

Since you are implementing an infection model (COVID?) and using real data (I assume COVID data), be aware of that such models in general are far from perfect. It is difficult to get good results without assuming some “inputs” over a long time horizon, e.g., because:

  • Health authority “advice” or restrictions on use of face mask, distancing, etc. are in effect “control inputs” that change infection rates, etc.
  • Few models attempt to describe the cause-effect link between these “control inputs” and, e.g., infection rate constants.

As an example: I used Covid models as a modeling project in my modeling class in the fall of 2020, with fit to experimental data from Johns Hopkins. I compared cases like my own country + Spain + Italy. In early November, the model predicted a collapse in the Italian health system by the end of November. But this did not happen. Possibly because the model I used was not good, or possibly because they introduced restrictions on distancing, face mask, etc. that my model did not include.

Some ways around this could be:

  • assume slowly varying parameters, e.g., p(t+1) = p(t) + noise. This should adapt the value of the parameters.
  • try to model the response of the public to governmental “health orders” (not everyone will or can follow such “orders”)
  • a possible way to model such “health orders” could be to use neural network blocks (universal models), or similar

Anyways, interesting problem.

ModelPredictiveControl.jl is entirely the fruit of Francis Gagnon’s hard work, I take no credit for this :blush:

1 Like

Oops. What-a-mistak-e-to-make-e ('Allo-'Allo)… [there is a similar package in JuliaHub??]

Yeah, I’ve already done this MHE problem in MATLAB and it worked fine. I wanted to switch to Julia because it’s far better for scientific purposes.

I have been working on the topic since August last year and assure you that the data collection and pre-processing have been the hardest issues to solve.
I think that the theoretical idea and the method is there, and it should be bulletproof, the issue sits in my coding ability :sweat:

Right now it seems that the ODEs I am trying to fit give more importance to one set of data than to another (for example, the Detected curve is well fitted, while all the others are way off). I had already encountered this problem previously, and I solved it by normalising the measurement data, but here it seems to not work.

In any case, thanks for the advice.

You handle this by choosing appropriate covariance matrices for dynamics and measurement noises. In this docstring, these are called Q and R. For a nonlinear dynamical system, both of these are important and their absolute values matter, not only their relative values as is the case for linear dynamics.