Robust resampling/interpolation method

I am working on a system identification problem with poor quality input data.

The data describes a heating system, with thermostat temperatures, thermal power inputs, supply/return water temperatures etc.

Some issues with the data are:

  • frequently missing data
  • spiking noise caused by activation of heaters
  • discretization errors
  • low frequency disturbances (mainly due to sunlight, but other sources exist)

A year ago when I tackled this problem, I kind of rolled my own algorithm based on DSP.resample and tuned it by hand. One of the problems it had was it introduced wiggles/oscillations. So I probably also need antialiasing.

Now I want to setup a proper processing pipeline that I can deploy in production.

I am mainly just looking for some suggestions on algorithms / packages to handle all these problems. Unfortunately, there isn’t much I can do about the data source because these are systems used by real people (pesky users right :sweat_smile:)

If you have any other recommendations I would also be very happy to hear those.

2 Likes

Missing data can be handled by predictor/corrector methods (such as Kalman or particle filters) by not perfoming the measurement correction when there is no data available, only the prediction. You can even handle arbitrary intervals inbetween the obtained data points, here’s an example

Here’s another example that has missing data

A common approach to handle this in gaussian filtering is to evaluate the likelihood of the obtained measurement, and if the measurement is several standard deviations away from the expected one, discard it (or inflate the measurement covariance matrix). The last tutorial above computes the Z-score used to evaluate the measurement, but never discards any data.

What does this mean? Quantization?

Something like this?

or any of those?


If you have a state estimator of the system that handles all of those things, system identification can be performed by, e.g., maximizing the likelihood of the data

4 Likes

You may also find it useful to have a look at some methods and tools for outlier detection such as Hampel outlier detection/filter. An implementation in Julia can be found at [ANN] HampelOutliers.jl. Standard filtering/smoothing techniques may unnecessarily project the value of such outlier to the neighbouring samples.

1 Like

Thanks for the suggestions, very useful!

Yes exactly.

I have indeed studied those, it is a very nice overview. I think state estimation is relatively straightforward in this problem, the tricky part is forecasting states and disturbances. What I couldn’t figure out is how to deploy those disturbance models to an actual controller that does MPC. For the sunshine model it’s clear, just extrapolate the timeseries and forecasted solar irradiance (I have that).

But for the low frequency disturbance? The convention in literature seems to be to just treat it like a constant parameter and assume it stays constant in the future. Maybe you have some thoughts on that?

Yes that would be ideal.

Models identified with methods that optimize the next-step ahead prediction diverge too quickly if I simulate longer timeseries. So if I am interested in simulation performance the methods in LowLevelParticleFilters.jl don’t necessarily seem applicable here? (if this is a wrong statement please correct me)

The main goal is to get a model that performs well in MPC with a horizon of 24 hours. That behaviour tends to de dominated by what the weather and heating system do, for which I have ‘good’ data and forecasts. I probably should have mentioned it’s still the same problem as my previous post :sweat_smile:

Its reasonably straightforward if the MPC tools are in the state-space form, as it is the case for most modern MPC software. You just build your MPC with the augmented model. The augmented model is your (deterministic) model of the heating system “concatenated” with all your disturbance models (that is, the state vector will be a concatenation of the heating system states and the disturbance states). Be sure to disable any builtin model augmentation if you DIY this part (for exemple, nint_ym=0 and nint_u=0 options in ModelPredictiveControl.jl).

Yes, a low frequency disturbances is generally approximated as constant value that can be added anywhere in the deterministic model: at the manipulated inputs, at the measured output or a constant parameter used anywhere between the two. The disturbance model of a constant value is Gaussian white-noise integrated only once.

I will let Fredrik answer the other questions related to LowLevelParticleFilters.jl. It seems like a really interesting modeling and control problem! :grinning_face:

1 Like

This is indeed how I implemented it. I think I was trying to understand that when you’re looking forward like in MPC you need good deterministic models to define the ‘known/forecasted disturbances’. I have good (forecasted) data on these disturbances, but the key question is how do they affect the states.

In the disturbance gallery of LowLevelParticleFilters.jl, I think we talk about ‘unmeasured disturbances’ only. These do not seem to be useful in the deterministic MPC case beyond estimating the initial condition, because there is no ‘correct’ step as there is no measured data. I am an EE and not a control engineer so please correct me if that’s wrong.

This is a great package! Because I need to generate C code for an embedded system, I chose to implement MPC as an optimization model in cvxpy and generate C code with cvxpygen. There does seem to be LinearMPC.jl now, I should try that out :slight_smile:

1 Like

I’m also an EE, but it morphed into a weird mix of EE+ChemE+BioTech+ControlE+SoftwareE. I’m no longer sure what I am :laughing:.

Yes, the stochastic models are for the unmeasured disturbances. But these models will affect the MPC predictions/forecasts beyond the initial state. For exemple, in the 3 plots below, the dots are the predictions at the discrete-time k=90 for the stochastic system:

\frac{y_s(z)}{e(z)} = \frac{1}{1-az^{-1}}

with a=0, 0.5 and 1, respectively from top to bottom.

The a=1 case is integrated Gaussian white noise, and you can see that predictions are kept constant in the future.

If you have access to a known/measured disturbance, you first need to determine how it affects your deterministic model (for exemple, added at the manipulated inputs or a load disturbance). This measured disturbance will then be used for feedforward compensation in the MPC controller, denoted with the vector \mathbf{d} in MPC.jl. At each control period, you will have access to your known disturbance \mathbf{d}(k). But there is still one thing to determine: how you predict \mathbf{d} in the future? The simplest solution is assuming a constant value in the future: \mathbf{\hat{d}}(k+1) = \mathbf{\hat{d}}(k+2) = \dots = \mathbf{d}(k). But the exact same theory as above can also be applied here. A constant \mathbf{d} in the future is just assuming that \mathbf{d} is the output of an integrated white noise. It can be modelled with any system from aforementioned disturbance gallery, but you will need to implement the forecasting yourself.

Yes, the is no code generation for now. I should really work on building an interface with LinearMPC.jl.

Edit: it made me think @darnstrom, am I right in thinking that custom prediction for measured disturbances are not supported right now in LinearMPC.jl (it’s really similar as the implementation of reference preview) ? If yes, it would be a great addition to the package.

1 Like

It’s not so much a convention as an explicit model that says it’s constant. If you use the model
\dot{x}_d = 0 + w
for your low-frequency disturbance, i.e., an integrator of noise, you are explicitly modeling something that has zero deterministic dynamics (constant), only random fluctuations that are equally likely to go in any direction. There are more elaborate disturbance models that do not have zero dynamics, and in that case, simulating those forward in time (for forecasting) does not in general lead to a constant evolution.

diverge in what sense?

Whenever you have unmeasured disturbances you are typically not interested in pure simulation performance, there must be some mechanism present to account for the unmeasured disturbances, and the only information available about this is the measurements you do have access to. These must feed into the dynamics somehow, which is what the filtering accomplishes. The approach from your previous post on the subject sounds spot on what you want, filtering with 24h forecast at each point, minimizing the forecasting errors along the trajectory.

You can treat forecasted values (from e.g. a weather forecast) as “measurements” and incorporate those as any other measurement. Typically, those would have a rather large covariance compared to actual sensor measurements.

So am I, at least in the sense that I studied EE :sweat_smile:

2 Likes

If the system has future disturbances with uncertainties, a primitive way to handle that in a “robust MPC” is as in https://doi.org/10.3384/ecp201709: we tried to control the level in a lake (hydro power reservoir) by manipulating the flood gates, and the disturbance was precipitation. Our partnering power producer provided estimates of flow into the lake based on the dynamics from precipitation through the catchment area and into the lake. This was based on purchased weather prediction 2-3 weeks into the future, with some 50 different realizations of possible predictions. The idea was to operate the flood gates so that the control requirements were satisfied for the worst case scenario.

My colleague in that publication has since done much more work on similar problems, with more efficient algorithms for MPC.

I would guess that it would be relatively straightforward to use your MPC package to do this.

Interesting approach, it fits nicely to weather forecasting! I took a look a the paper quickly and, the deterministic MPC algorithm could be easily implemented in MPC.jl with currently available features. But scenario trees and multi-objective optimization are not supported right now (except maybe indirectly with a custom cost function + a weighted sum of each objective), so the would not be the case for stochastic MPC. Robust MPC through scenario trees are in my long-term goals, but I’m prioritizing as complete and as efficients-as-possible “non-robust” predictive controllers before entering into this rabit hole :rabbit_face:

1 Like

The way we did the robust MPC was simply to simulate into the future (say) 50 models of the system with 50 different disturbance sequences, each with the same control input sequence, then compute the cost function for each system, and use the sum of the 50 cost functions as the cost function in the MPC algorithm.

This can easily be done by augmenting the system with (say) 50 times larger state than the deterministic system, 50 times as many disturbance input sequences, and a corresponding output. The number of control inputs is the same as for the deterministic system. I’m sure this fits into your MPC algorithm.

This is a very intuitive way to do robust MPC, although not the most efficient. But simple to understand, and (I think) simple to implement.

1 Like

Coincidently I am currently running through pretty much the same… well… pain.

With a new data-intensive (sysID) project I said to myself that it is perhaps the time to try to learn and adopt the more DataFrame-ish workflow (resembling they Python’s Pandas) instead of handling the time series at the very low level as pure vectors (the habit I had developed in Matlab before the advent of their timetables).

First, DataFrames.jl does not support the date/datetime index by itself. I have found the TSFrames.jl package that brings date/datetime index into DataFrames. Mostly the resulting “object” indeed behaves transparently as DataFrame, but there are some glitches.

And I confess that I am struggling with even the elementary steps. Mainly because I am not yet fluent with that dataframe-ish way of working, but also because even very basic functionality is typically provided by some other packages, and it takes quite some time to find them and learn which are usable…

Oh, how I would love to see a time-series extension of the online book https://juliadatascience.io to time-series workflows. How to do imputing (perhaps using Impute.jl, you may find it useful too), how to remove outliers (I am only learning if can use the Hampel filter that I mentioned here earlier, or is there anything reasonably well maintained?) which tool currently best fits into this workflow for smoothing, resampling, the latter being my the step I am currently struggling with. Then how to do cumsum for the columns of the DataFrame/TSFrame (here I got lost in the rabbit holes here in the discussion forum). RollingFunction.jl is in the to-do (to-study) list to learn if and how I could then apply other filters such as Savitzky-Golay.jl.

I am still enjoying it but I admit it is tempting to go back just to “long vectors”.

2 Likes

I found that if I fit the model on 1-step ahead predictions, that fit doesn’t perform well if I simulate that model forwards for 24 hours, even if disturbances are mostly absent. I guess that’s because the dynamics are so slow and the 1-step ahead prediction doesn’t capture that. A good observer is not a good predictor in this case :sweat_smile:

I think this is the right approach, because it also matches how the model would be used in reality. If it’s not too much effort could you maybe give a rough sketch how you would approach this with LowLevelParticleFilters.jl?

EEs unite!

1 Like

Something like this pseudo code

function cost(pars)
    kf = FilterConstructor(pars)
    c = zero(eltype(pars))
    for time in all_timepoints[2:end]
        update!(kf, u[time-1], y[time-1])
        x_sim, u_sim, y_sim = rollout(kf, state(kf), u[time:time+24])

        c += sum((y_sim .- y[time:time+24]).^2) / length(y_sim)
    end
    return c / length(all_timepoints)
end

The sum of squared error cost can be replaced by, e.g., loglikelihood of prediction errors if you step forward with repeated calls to predict! instead of the rollout call above, but this should be a reasonable starting place.

1 Like

Here’s something less pseudo

using Pkg
cd(@__DIR__)
Pkg.activate(".")
using LowLevelParticleFilters
using LowLevelParticleFilters: SimpleMvNormal
using StaticArrays
using LinearAlgebra
using Random
using Plots
using StatsPlots
using Optim
using SeeToDee

## System definition: Damped nonlinear pendulum with forcing
# This is a simple but interesting nonlinear system with interpretable parameters

"""
    pendulum_dynamics(x, u, p, t)

Nonlinear pendulum dynamics: ẍ = -b*ẋ - g/L*sin(x) + u/m
State: x = [angle, angular_velocity]
Parameters: p = [damping_coefficient, g/L, 1/m]
"""
function pendulum_dynamics(x, u, p, t)
    b, gL, inv_m = p
    θ, ω = x
    u_val = u === nothing ? 0.0 : u[1]

    SA[
        ω,
        -b*ω - gL*sin(θ) + inv_m*u_val
    ]
end

"""
    measurement(x, u, p, t)

Measurement function: we can observe the angle directly
"""
measurement(x, u, p, t) = SA[x[1]]

## Generate synthetic data with known parameters
Random.seed!(42)

# True system parameters
p_true = [0.2, 9.81, 1.0]  # [damping, g/L, 1/m]

# Simulation settings
Ts = 0.01  # Sample time
T = 2000    # Number of time steps
t = 0:Ts:(T-1)*Ts

# Generate input signal (sinusoidal forcing)
u = [SA[0.5*sin(0.3*t_i)] for t_i in t]

# System dimensions
nx = 2  # [angle, angular_velocity]
nu = 1  # [force]
ny = 1  # [angle measurement]

# Noise covariances
R1 = Diagonal(SA[0.0001, 0.001])  # Small process noise
R2 = Diagonal(SA[0.01])         # Measurement noise

# Initial state
x0 = SA[0.5, 0.0]  # Start at angle=0.5 rad, velocity=0

# Create discrete-time dynamics using RK4 integration
discrete_dynamics = SeeToDee.Rk4(pendulum_dynamics, Ts, supersample=2)

# Generate true trajectory using rollout
x_true = LowLevelParticleFilters.rollout(discrete_dynamics, x0, u, p_true)

# Add noise to create measurements
y = [measurement(x, u[i], p_true, t[i]) + SA[0.1*randn()] for (i,x) in enumerate(x_true[1:end-1])]

# Add process noise to true states for more realistic scenario
x_true_noisy = [x_true[i] + SA[sqrt(R1[1,1])*randn(), sqrt(R1[2,2])*randn()] for i in 1:length(x_true)-1]

println("Generated $(length(y)) measurements from nonlinear pendulum system")
println("True parameters: damping=$(p_true[1]), g/L=$(p_true[2]), 1/m=$(p_true[3])")

## Define cost function using multi-step prediction error

"""
    multistep_prediction_cost(p, prediction_horizon)

Cost function that minimizes multi-step prediction errors.

At each time point:
1. Update filter with observation y[t-1]
2. Perform rollout for `prediction_horizon` steps
3. Calculate squared prediction errors
"""
function multistep_prediction_cost(p::Vector{T}, prediction_horizon=5) where T
    # Create filter with current parameter guess
    d0 = SimpleMvNormal(T.(x0), T.(R1))
    ekf = ExtendedKalmanFilter(discrete_dynamics, measurement, R1, R2, d0; nu, ny, p=p)

    # Track filter state with first observation
    correct!(ekf, u[1], y[1], p, 1*Ts)

    cost = zero(T)
    n_predictions = 0

    # Loop through time points (starting from t=2)
    for t_idx in 2:length(y)
        # Update filter with observation at t-1
        predict!(ekf, u[t_idx], p, t_idx*Ts)
        correct!(ekf, u[t_idx], y[t_idx], p, t_idx*Ts)

        # Get current state estimate
        x_current = state(ekf)

        # Determine how many steps we can predict
        steps_remaining = length(y) - t_idx
        n_steps = min(prediction_horizon, steps_remaining)

        if n_steps > 0
            # Perform multi-step rollout
            u_future = u[t_idx+1:t_idx+n_steps]
            x_sim = LowLevelParticleFilters.rollout(discrete_dynamics, x_current, u_future, p)

            # Calculate prediction errors
            for k in 1:n_steps
                y_pred = measurement(x_sim[k], u[t_idx+k], p, (t_idx+k)*Ts)
                y_true = y[t_idx+k]
                error = y_pred - y_true
                cost += sum(abs2, error)
                n_predictions += 1
            end
        end
    end

    # Normalize by number of predictions
    return cost / n_predictions
end

## Perform parameter estimation

# Initial parameter guess (perturbed from true values)
p_guess = p_true .* (1.0 .+ 0.3*randn(length(p_true)))
p_guess = max.(p_guess, 0.01)  # Ensure positive parameters

println("\nInitial parameter guess: $p_guess")
println("Initial cost: $(multistep_prediction_cost(p_guess, 5))")

# Optimize using BFGS
println("\nOptimizing parameters using multi-step prediction cost (horizon=5)...")
result = Optim.optimize(
    p -> multistep_prediction_cost(p, 45),
    p_guess,
    BFGS(),
    Optim.Options(
        show_trace = true,
        store_trace = true,
        show_every = 5,
        iterations = 100,
        g_tol = 1e-8,
    );
    autodiff = :forward
)

p_estimated = Optim.minimizer(result)
println("\n" * "="^60)
println("Optimization complete!")
println("True parameters:      $p_true")
println("Estimated parameters: $p_estimated")
println("Relative error:       $(abs.((p_estimated .- p_true) ./ p_true))")
println("="^60)

## Generate trajectories with estimated parameters for comparison

x_estimated = LowLevelParticleFilters.rollout(discrete_dynamics, x0, u, p_estimated)
y_estimated = [measurement(x, u[i], p_estimated, t[i]) for (i,x) in enumerate(x_estimated[1:end-1])]

x_initial_guess = LowLevelParticleFilters.rollout(discrete_dynamics, x0, u, p_guess)
y_initial_guess = [measurement(x, u[i], p_guess, t[i]) for (i,x) in enumerate(x_initial_guess[1:end-1])]

## Visualize results

# Plot 1: Measurements and predictions
p1 = plot(t, [y[i][1] for i in 1:length(y)], label="Measurements",
          marker=:circle, markersize=2, alpha=0.5,
          xlabel="Time [s]", ylabel="Angle [rad]", title="Multi-Step Prediction Estimation")
plot!(p1, t, [x_true[i][1] for i in 1:length(t)], label="True trajectory", lw=2)
plot!(p1, t, [x_initial_guess[i][1] for i in 1:length(t)], label="Initial guess",
      lw=2, linestyle=:dot, color=:red, alpha=0.7)
plot!(p1, t, [x_estimated[i][1] for i in 1:length(t)], label="Estimated model",
      lw=2, linestyle=:dash)

# Plot 2: Parameter estimates
param_names = ["Damping b", "g/L", "1/m"]
p2 = groupedbar(param_names, [p_true p_guess p_estimated],
         label=["True" "Initial guess" "Estimated"],
         title="Parameter Comparison",
         ylabel="Parameter Value",
         legend=:topright, bar_width=0.7)

# Plot 3: Cost function over optimization
p3 = plot(getproperty.(result.trace, :value),
         xlabel="Iteration",
         ylabel="Cost",
         title="Optimization Convergence",
         yscale=:log10,
         lw=2,
         legend=false)

# Combine plots
plot(p1, p2, p3, layout=(3,1), size=(800, 900))
display(current())

6 Likes

Wow, that’s amazing stuff! Can’t believe how short and readable that code is, considering how much it’s doing. Might be a nice example for the gallery as well.

I adapted your script to work with an LTI that is closer to the problem I am trying to solve. So I moved to ControlSystems.jl and c2d.

Since you are obviously the expert on all of these packages could you perhaps take a look at this script in case you spot anything that could be done better? I would really appreciate it :slight_smile:

(it is self-contained so it should be possible to just run it)

using Pkg
cd(@__DIR__)
Pkg.activate(".")
using LowLevelParticleFilters
using LowLevelParticleFilters: SimpleMvNormal
using StaticArrays
using LinearAlgebra
using Random
using Plots
using StatsPlots
using Optimization
using ControlSystems
using Distributions
using OptimizationOptimJL


# basic 3-state LTI with 3 inputs and 2 outputs
"""
    thermal_dynamics(p)

Create continuous-time state-space system for thermal model.

Arguments:
- p: Parameters [C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w]

Returns: Continuous-time StateSpace system
"""
function thermal_dynamics(p)
    C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w = p

    A = SA[
        (-1/R_ia+-1/R_ie+-1/R_ih)/C_i 1/(C_i*R_ie) 1/(C_i*R_ih);
        1/(C_e*R_ie) (-1/R_ea+-1/R_ie)/C_e 0;
        1/(C_h*R_ih) 0 -1/(C_h*R_ih)
    ]

    B = SA[
        1/(C_i*R_ia) 0 A_w/C_i;
        1/(C_e*R_ea) 0 A_e/C_e;
        0 1/C_h 0
    ]

    C = SA[
        1.0 0.0 0.0;
        0.0 0.0 1.0
    ]

    D = @SArray zeros(2, 3)

    return ss(A, B, C, D)
end

# true parameter values [C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w]
p_true = [12.0, 45.0, 0.3, 3.0, 1.5, 70.0, 5.0, 1.2, 8.0]

# simulation settings
Ts = 0.25   # Sample time [hours]
n_days = 4 # Number of days to simulate
T = Int(floor(n_days * 24 / Ts))  # Number of time steps
t = 0:Ts:(T-1)*Ts

# we have three measured disturbances to consider:
# u[1]: (measured) ambient temperature
# u[2]: (measured) heat supplied
# u[3]: (measured) solar radiation

# ambient temperature [°C]
u_1 = 10 .+ 8 * sin.(2π * (t .- 6) / 24)

# prbs input for heat supplied
Random.seed!(42)
u_2 = zeros(T)
for k in 1:T
    if rand() < 0.01
        u_2[k] = 10.0 * (rand() - 0.5)
    elseif k > 1
        u_2[k] = u_2[k-1]
    end
end

# heat supplied (non-negative) [kW]
u_2 = max.(u_2, 0.0)

# solar radiation [kW/m²]
u_3 = 0.4 .* max.(sin.(2π * (t .- 6) / 24), 0.0)

# noise covariances
R1 = Diagonal(SA[0.01, 0.01, 0.01])
R2 = Diagonal(SA[0.5, 1.0])

# discrete-time system
sys_c = thermal_dynamics(p_true)
sys_d = c2d(sys_c, Ts, :tustin)

"""
    measurement(x, u, p, t)

Measurement function: we observe T_i (indoor temperature) and T_h (heater temperature)
State: x = [T_i, T_e, T_h]
Output: y = [T_i, T_h]
"""
measurement(x, u, p, t) = SA[x[1], x[3]]

# generate true trajectory using lsim
x0 = SA[20.0, 15.0, 22.0]  # initial state
# u needs to be a matrix of size (num_inputs, num_timesteps)
u = [u_1'; u_2'; u_3']  # 3 x T matrix
result = lsim(sys_d, u, t, x0=x0)

# extract states (result.x is num_states x num_timesteps)
x_true = result.x'  # transpose to get num_timesteps x num_states for plotting

# visualize the true states
plot(t, x_true, xlabel="Time [hours]", ylabel="Temperature [°C]", label=["T_i" "T_e" "T_h"],
    title="True States", color=[:red :black :blue], linewidth=2)

# add noise to create measurements using measurement function
y = [measurement(x_true[i, :], u[:, i], p_true, t[i]) + rand(MvNormal(SA[0.0, 0.0], R2)) for i in axes(x_true, 1)]

# add process noise to true states for more realistic scenario
x_noisy = [x_true[i, :] + rand(MvNormal(SA[0.0, 0.0, 0.0], R1)) for i in axes(x_true, 1)]

println("Generated $(length(y)) measurements from thermal system")
println("True parameters: C_i=$(p_true[1]), C_e=$(p_true[2]), C_h=$(p_true[3]), R_ia=$(p_true[4]), R_ie=$(p_true[5]), R_ea=$(p_true[6]), R_ih=$(p_true[7]), A_e=$(p_true[8]), A_w=$(p_true[9])")

## Define cost function using multi-step prediction error

function rollout!(x::Vector, dynamics, x0, u, p)
    x[1] = x0
    for k in eachindex(u)
        x[k+1] = dynamics.A * x[k] + dynamics.B * u[k]
    end
    return x
end

"""
    multistep_prediction_cost(p, sys, prediction_horizon=5)

Cost function that minimizes multi-step prediction errors.

At each time point:
1. Update filter with observation y[t-1]
2. Perform rollout for `prediction_horizon` steps
3. Calculate squared prediction errors

Arguments:
- p: Parameters to evaluate
- sys: Discrete-time state-space system
- prediction_horizon: Number of steps to predict ahead
"""
function multistep_prediction_cost(p::Vector{T}, sys, prediction_horizon=5) where T
    # intial condition for Kalman filter
    d0 = SimpleMvNormal(T.(x0), T.(R1))
    kf = KalmanFilter(sys.A, sys.B, sys.C, zeros(size(sys.D)), T.(R1), T.(R2), d0; nu=3, ny=2, p=p, check=false)

    # track filter state with first observation
    correct!(kf, u[:, 1], y[1], p, 1 * Ts)

    # pre-allocate error storage (2 outputs x max_errors)
    max_errors = (lastindex(y) - 1) * prediction_horizon
    errors = zeros(T, 2, max_errors)
    n_predictions = 0

    # loop through time points starting from t=2
    for t_idx in 2:lastindex(y)
        # update filter with observation at t-1
        predict!(kf, u[:, t_idx], p, t_idx * Ts)
        correct!(kf, u[:, t_idx], y[t_idx], p, t_idx * Ts)

        # get current state estimate
        x_current = state(kf)

        # determine how many steps we can predict
        steps_remaining = length(y) - t_idx
        n_steps = min(prediction_horizon, steps_remaining)

        if n_steps > 0
            # extract future inputs
            u_future = [u[:, t_idx+k] for k in 1:n_steps]

            # pre-allocate state trajectory (includes initial state)
            x_traj = Vector{typeof(x_current)}(undef, n_steps + 1)

            # perform multi-step prediction using rollout!
            rollout!(x_traj, sys, x_current, u_future, p)

            # calculate and store prediction errors
            for k in 1:n_steps
                # use measurement function to get predicted output
                y_pred = measurement(x_traj[k+1], u[:, t_idx+k], p, (t_idx + k) * Ts)
                y_true = y[t_idx+k]

                # store prediction error (each column is one error vector)
                n_predictions += 1
                errors[:, n_predictions] = y_pred - y_true
            end
        end
    end

    # compute cost from all errors
    cost = zero(T)
    for i in 1:n_predictions
        cost += sum(abs2, errors[:, i])
    end

    # normalize by number of predictions
    return cost / n_predictions
end

"""
    estimate_thermal_parameters(;
                                 p_guess=nothing,
                                 prediction_horizon=10,
                                 max_iterations=100,
                                 g_tol=1e-8,
                                 show_trace=false,
                                 lb=nothing,
                                 ub=nothing)

Estimate thermal model parameters from measurement data.

Uses global variables:
- x0: Initial state [T_i, T_e, T_h]
- u: Input matrix (3 x num_timesteps) with rows [T_a, Q_h, Q_s]
- y: Measurement sequence [SVector{2}(T_i_meas, T_h_meas), ...]
- Ts: Sample time (hours)
- R1: Process noise covariance (3x3)
- R2: Measurement noise covariance (2x2)

Keyword Arguments:
- p_guess: Initial parameter guess [C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w]. If nothing, uses defaults
- prediction_horizon: Number of steps for multi-step prediction cost
- max_iterations: Maximum optimization iterations
- g_tol: Gradient tolerance for convergence
- show_trace: Whether to show optimization progress
- lb: Lower bounds for parameters. If nothing, uses reasonable defaults
- ub: Upper bounds for parameters. If nothing, uses reasonable defaults

Returns: NamedTuple with fields:
- parameters: Estimated parameters [C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w]
- result: Full Optimization.jl result
- cost: Final cost value
- converged: Whether optimization converged
- initial_guess: Initial parameter guess used
"""
function estimate_thermal_parameters(;
    p_guess=nothing,
    prediction_horizon=10,
    max_iterations=100,
    g_tol=1e-8,
    show_trace=false,
    lb=nothing,
    ub=nothing
)

    # Generate initial guess if not provided
    if p_guess === nothing
        # default guess: [C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w]
        p_guess = max.([10.0, 40.0, 1.0, 2.0, 1.0, 5.0, 0.5, 2.0, 4.0] .+ 0.5 .* randn(9), 0.1)
    end

    # Set reasonable default bounds if not provided
    # Parameters: [C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w]
    if lb === nothing
        lb = [
            0.1,    # C_i: minimum thermal capacitance indoor air [kWh/K]
            0.1,    # C_e: minimum thermal capacitance envelope [kWh/K]
            0.01,   # C_h: minimum thermal capacitance heating system [kWh/K]
            0.01,   # R_ia: minimum thermal resistance indoor-ambient [K/kW]
            0.01,   # R_ie: minimum thermal resistance indoor-envelope [K/kW]
            0.1,    # R_ea: minimum thermal resistance envelope-ambient [K/kW]
            0.01,   # R_ih: minimum thermal resistance indoor-heating [K/kW]
            0.01,   # A_e: minimum solar absorption envelope [-]
            0.01    # A_w: minimum solar absorption windows [-]
        ]
    end

    if ub === nothing
        ub = [
            100.0,  # C_i: maximum thermal capacitance indoor air [kWh/K]
            200.0,  # C_e: maximum thermal capacitance envelope [kWh/K]
            50.0,   # C_h: maximum thermal capacitance heating system [kWh/K]
            20.0,   # R_ia: maximum thermal resistance indoor-ambient [K/kW]
            10.0,   # R_ie: maximum thermal resistance indoor-envelope [K/kW]
            100.0,  # R_ea: maximum thermal resistance envelope-ambient [K/kW]
            20.0,   # R_ih: maximum thermal resistance indoor-heating [K/kW]
            10.0,   # A_e: maximum solar absorption envelope [-]
            20.0    # A_w: maximum solar absorption windows [-]
        ]
    end

    # ensure initial guess is within bounds
    p_guess = clamp.(p_guess, lb, ub)

    # define optimization problem
    function cost_fn(p, _)
        sys_c_temp = thermal_dynamics(p)
        sys_d_temp = c2d(sys_c_temp, Ts, :tustin)
        return multistep_prediction_cost(p, sys_d_temp, prediction_horizon)
    end

    # create optimization function
    optf = OptimizationFunction(cost_fn, Optimization.AutoForwardDiff())
    prob = OptimizationProblem(optf, p_guess, nothing; lb=lb, ub=ub)

    # run the parameter estimation optimization
    result = solve(
        prob,
        OptimizationOptimJL.LBFGS(),
        maxiters=max_iterations,
        show_trace=show_trace,
        store_trace=true,
        extended_trace=true,
        g_tol=g_tol
    )

    # extract estimated parameters
    p_estimated = result.u

    # extract cost history from trace
    cost_history = [t.value for t in result.original.trace]

    return (
        parameters=p_estimated,
        result=result,
        cost=result.objective,
        converged=result.retcode == ReturnCode.Success,
        initial_guess=p_guess,
        cost_history=cost_history,
    )
end

# initial guess (perturbed from true values)
p_guess = max.(p_true .* (1.0 .+ 0.5 * randn(length(p_true))), 0.01)

# estimate parameters
result = estimate_thermal_parameters(
    p_guess=p_guess,
    prediction_horizon=24 * 4,
    max_iterations=350,
    show_trace=true
)

# display results
println("\n" * "="^60)
println("Optimization complete!")
println("True parameters:      $(p_true)")
println("Estimated parameters: $(result.parameters)")
println("Relative error:       $(abs.((result.parameters .- p_true) ./ p_true))")
println("Initial guess:        $(result.initial_guess)")
println("Final cost:           $(result.cost)")
println("Converged:            $(result.converged)")
println("="^60)

# plot parameter comparison
param_names = ["C_i", "C_e", "C_h", "R_ia", "R_ie", "R_ea", "R_ih", "A_e", "A_w"]
p1 = groupedbar(param_names, [p_true result.initial_guess result.parameters],
    label=["True" "Initial guess" "Estimated"],
    title="Parameter Comparison",
    ylabel="Parameter Value",
    legend=:topright, bar_width=0.7)

# plot cost convergence
p2 = plot(
    result.cost_history,
    xlabel="Iteration",
    ylabel="Cost",
    title="Optimization Convergence",
    label="Cost",
    linewidth=2,
    yscale=:log10,
    legend=:topright,
    marker=:circle,
    markersize=3
)

# combine plots
plot(p1, p2, layout=(2, 1), size=(1200, 800))
display(current())

I am not really sure why there is a sudden jump in the cost function trace and then afterwards it’s just flat.

I’m not sure why the cost function jumps either, but RC models can easily suffer from identifiability issues, it’s often only possible to estimate RC, not R and C separately. A quick check to see if this is an issue is to compute the eigenvalues of the Hessian of the cost at the optimum, if it’s singular, you have an identifiability issue and should switch to a single RC parameter. If the hessian becomes singular, maybe the optimizer freaks out?

You can also use StructuralIdentifiability.jl, it can tell you if you have such issues, and possibly which combinations of parameters you can identify using find_identifiable_functions(ode).


EDIT: You have a 3dim state and a single input, such a model has a canonical form with 6 parameters but you’re estimating 9, so my guess is that the Hessian has at least a 3dim nullspace


EDIT2: I see now that you have multiple inputs and outputs, in that case the canonical form analysis is a bit more complicated, and I’d use the Hessian or SI.jl analysis instead.

1 Like

Thanks! You are right about the identifability potentially being an issue. It seems that in this specific structure the problem is somewhat mitigated:

#! format: off
ode = @ODEmodel(
    x1'(t) = (-1 / R_ia - 1 / R_ie - 1 / R_ih) / C_i * x1(t)
             + 1 / (C_i * R_ie) * x2(t)
             + 1 / (C_i * R_ih) * x3(t)
             + 1 / (C_i * R_ia) * u1(t)
             + A_w / C_i * u3(t), 
             
    x2'(t) = 1 / (C_e * R_ie) * x1(t)
            + (-1 / R_ea - 1 / R_ie) / C_e * x2(t)
            + 1 / (C_e * R_ea) * u1(t)
            + A_e / C_e * u3(t), 

    x3'(t) = 1 / (C_h * R_ih) * x1(t)
            - 1 / (C_h * R_ih) * x3(t)
            + 1 / C_h * u2(t), 
            
    y1(t) = x1(t),
    y2(t) = x3(t)
)
#! format: on

assess_identifiability(ode)

This outputs:

[ Info: Search for polynomial generators concluded in 2.72061246
[ Info: Selecting generators in 0.009230624
[ Info: Inclusion checked with probability 0.9955 in 3.476992608 seconds
[ Info: Global identifiability assessed in 35.721917875 seconds
OrderedCollections.OrderedDict{Any, Symbol} with 12 entries:
  x1(t) => :globally
  x2(t) => :globally
  x3(t) => :globally
  A_e   => :globally
  A_w   => :globally
  C_e   => :globally
  C_h   => :globally
  C_i   => :globally
  R_ea  => :globally
  R_ia  => :globally
  R_ie  => :globally
  R_ih  => :globally

I also ran find_identifiable_functions(ode), it just returns the original parameters.

Now to find out what’s causing the jump.

tried Ipopt.jl as well, doesn’t seem to have any issues:

It’s interesting to see that the largest capacitor C_e is being consistently underestimated. I may have to go to even longer time horizons to get an accurate estimate of this parameter.