Identification using LowLevelParticleFilters.jl

I am looking for an example (or some guidance) on how to fit a Kalman smoother to data using log-likelihood maximization using LowLevelParticleFilters.jl.

I already have a model structure and have collected a dataset. It is a fairly simple 3-state LTI system, of which I can measure 2. It models the (very slow) thermal dynamics of a building in the presence of several large disturbances. I have decently accurate data on a few of these, like the ambient temperature and the solar irradiance. But others I am still missing, like people and heat gains from electronics. (it is actually very similar to the content described in the tutorials by Baggepinnen)

Now my goals are as follows:

  • Calibrate the model parameters for 24 hour simulation performance. I will use this in an MPC context.
  • Calibrate the process and measurement noise covariance matrices.

I augmented the state-space model with an integrating disturbance, I tried to follow: Disturbance gallery Ā· LowLevelParticleFilters Documentation

I already followed the tutorials on youtube by Baggepinnen. I also read through the documentation, and followed the tutorials on JuliaHub.

First, I understand I need to disretize the linear model. As described in How to Tune a Kalman Filter: Step-by-Step Guide | JuliaHub, we can use the matrix exponential for this. I implemented the discrete dynamics as follows:

function setup_filter(p::Vector{T}) where {T}
    # unpack parameters
    C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w = p[1:9]
    σi, σe, σh, σq = p[10:13] # process noise std dev
    σmi, σmh = p[14:15] # measurement noise std dev (interior thermostat, supply temp)
    Ti0, Te0, Th0 = p[16:18] # initial state 

    # continuous-time state-space matrices (3x3 system)
    A_cont = @SMatrix [
        (-1/R_ia-1/R_ie-1/R_ih)/C_i 1/(C_i*R_ie) 1/(C_i*R_ih);
        1/(C_e*R_ie) (-1/R_ea-1/R_ie)/C_e 0;
        1/(C_h*R_ih) 0 -1/(C_h*R_ih)
    ]
    B_cont = @SMatrix [
        1/(C_i*R_ia) 0 A_w/C_i;
        1/(C_e*R_ea) 0 A_e/C_e;
        0 1/C_h 0
    ]
    G = @SMatrix [1 / C_i; 0; 0]

    # x_aug = [Ti, Te, Th, q]' where q is the integrating disturbance
    A_cont_aug = @SMatrix [
        A_cont[1, 1] A_cont[1, 2] A_cont[1, 3] G[1];
        A_cont[2, 1] A_cont[2, 2] A_cont[2, 3] G[2];
        A_cont[3, 1] A_cont[3, 2] A_cont[3, 3] G[3];
        0.0 0.0 0.0 0.0
    ]

    B_cont_aug = @SMatrix [
        B_cont[1, 1] B_cont[1, 2] B_cont[1, 3];
        B_cont[2, 1] B_cont[2, 2] B_cont[2, 3];
        B_cont[3, 1] B_cont[3, 2] B_cont[3, 3];
        0.0 0.0 0.0
    ]

    # augmented AB matrix for discretization
    AB_cont = zeros(7, 7)
    AB_cont[1:4, 1:4] = Matrix(A_cont_aug)
    AB_cont[1:4, 5:7] = Matrix(B_cont_aug)

    # discretize
    AB_disc = exp(AB_cont * Ts)
    A_disc = SMatrix{4,4}(AB_disc[1:4, 1:4])
    B_disc = SMatrix{4,3}(AB_disc[1:4, 5:7])

    # y = C * x + 0 * u --> output is [Ti, Th]
    C = @SMatrix [1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0]

    # process noise covariance R1
    R1 = Diagonal([σi^2 / C_i^2, σe^2 / C_e^2, σh^2 / C_h^2, σq^2 / C_i^2])

    # measurement noise covariance R2
    R2 = Diagonal([σmi^2, σmh^2])

    # initial state distribution d0
    d0 = MvNormal(T.([Ti0, Te0, Th0, 0.0]), Diagonal([1.0, 5.0, 5.0, 1.0]))

    return KalmanFilter(
        A_disc, B_disc, C, zeros(ny, nu),
        R1, R2, d0; Ts, p
    )
end

# Re-initialize with corrected filter
p = [p_init[Param] for Param in
     ["C_i", "C_e", "C_h", "R_ia", "R_ie", "R_ea", "R_ih", "A_e", "A_w",
    "σi", "σe", "σh", "σq", "σmi", "σmh", "Ti0", "Te0", "Th0"]]

kf = setup_filter(p);
measure = LowLevelParticleFilters.measurement(kf)

The problem here is that it doesn’t seem to be possible to autodiff through this, because of the matrix exponential. When I replace the matrix exponential by a simple forward euler step (not ideal for accuracy), I still get autodiff complaints that KalmanFilter constructor in LowLevelParticleFilters.jl is internally calling eigvals() which then fails.

Should I use the c2d(ss) as described here instead? Will that work with autodiff?

Discretization Ā· LowLevelParticleFilters Documentation

I would appreciate some guidance on how to approach this problem.

1 Like

You can use

c2d(ss(Ac, Bc, Cc, Dc), Ts, :tustin)

for AD compatibility and then pass KalmanFilter(..., check=false) to turn off the check that calls eigvals in the KalmanFilter constructor. This check is just there to catch the user forgetting to perform discretization and can safely be skipped when you know that you have discretized the model.

1 Like

Thank you!

So this setup should in principle work?

function setup_filter(p::Vector{T}) where {T}
    # unpack parameters
    C_i, C_e, C_h, R_ia, R_ie, R_ea, R_ih, A_e, A_w = p[1:9]
    σi, σe, σh, σq = p[10:13] # process noise std dev
    σmi, σmh = p[14:15] # measurement noise std dev (interior thermostat, supply temp)
    Ti0, Te0, Th0 = p[16:18] # initial state 

    # continuous-time state-space matrices (3x3 system)
    A_cont = @SMatrix [
        (-1/R_ia-1/R_ie-1/R_ih)/C_i 1/(C_i*R_ie) 1/(C_i*R_ih);
        1/(C_e*R_ie) (-1/R_ea-1/R_ie)/C_e 0;
        1/(C_h*R_ih) 0 -1/(C_h*R_ih)
    ]
    B_cont = @SMatrix [
        1/(C_i*R_ia) 0 A_w/C_i;
        1/(C_e*R_ea) 0 A_e/C_e;
        0 1/C_h 0
    ]
    G = @SMatrix [1 / C_i; 0; 0]

    # x_aug = [Ti, Te, Th, q]' where q is the integrating disturbance
    A_aug = @SMatrix [
        A_cont[1, 1] A_cont[1, 2] A_cont[1, 3] G[1];
        A_cont[2, 1] A_cont[2, 2] A_cont[2, 3] G[2];
        A_cont[3, 1] A_cont[3, 2] A_cont[3, 3] G[3];
        0.0 0.0 0.0 0.0
    ]

    B_aug = @SMatrix [
        B_cont[1, 1] B_cont[1, 2] B_cont[1, 3];
        B_cont[2, 1] B_cont[2, 2] B_cont[2, 3];
        B_cont[3, 1] B_cont[3, 2] B_cont[3, 3];
        0.0 0.0 0.0
    ]

    # state-space discretization
    C = @SMatrix [1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0]
    D = @SMatrix zeros(ny, nu)
    ss_disc = c2d(ss(A_aug, B_aug, C, D), Ts, :tustin)
    A_disc = SMatrix{4,4}(ss_disc.A)
    B_disc = SMatrix{4,3}(ss_disc.B)

    # process noise covariance R1
    R1 = Diagonal([σi^2 / C_i^2, σe^2 / C_e^2, σh^2 / C_h^2, σq^2 / C_i^2]) + 1e-9I

    # measurement noise covariance R2
    R2 = Diagonal([σmi^2, σmh^2]) + 1e-9I

    # initial state distribution d0
    d0 = MvNormal(T.([Ti0, Te0, Th0, 0.0]), Diagonal([1.0, 5.0, 5.0, 1.0]))

    return KalmanFilter(
        A_disc, B_disc, C, D,
        R1, R2, d0; Ts, p, check=false
    )
end

# re-initialize with corrected filter
p = [p_init[Param] for Param in
     ["C_i", "C_e", "C_h", "R_ia", "R_ie", "R_ea", "R_ih", "A_e", "A_w",
    "σi", "σe", "σh", "σq", "σmi", "σmh", "Ti0", "Te0", "Th0"]]

kf = setup_filter(p);
measure = LowLevelParticleFilters.measurement(kf)

With cost function:

function cost(plog::Vector{T}) where T
    p = exp10.(plog)
    kf = setup_filter(p)
    try
        return -LowLevelParticleFilters.loglik(kf, u, y, p)
    catch
        return T(Inf)
    end
end


p0 = log10.(p)
cost(p0)

res = Optim.optimize(
    cost,
    p0,
    BFGS(linesearch=LineSearches.BackTracking()),
    Optim.Options(
        show_trace = true,
        show_every = 5,
        iterations = 1000,
        time_limit = 130,
        f_reltol   = 1e-3,
        f_abstol   = 1e-3,
    ),
    autodiff = :forward,
)

popt = exp10.(res.minimizer)

Unfortunately c2d is throwing an error:

ArgumentError: matrix contains Infs or NaNs

Stacktrace:
  [1] chkfinite
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:105 [inlined]
  [2] getrf!(A::Matrix{Float64}, ipiv::Vector{Int64}; check::Bool)
    @ LinearAlgebra.LAPACK ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:582
  [3] getrf!
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:580 [inlined]
  [4] getrf!
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:787 [inlined]
  [5] #lu!#182
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:91 [inlined]
  [6] lu!
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:90 [inlined]
  [7] lu!
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:89 [inlined]
  [8] _lu
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:347 [inlined]
  [9] lu(::Matrix{Float64}; kwargs::@Kwargs{})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:341
 [10] lu
    @ ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lu.jl:341 [inlined]
 [11] \(A::Matrix{Float64}, B::Matrix{Float64})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.11.6+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1132
 [12] c2d_x0map(sys::StateSpace{ControlSystemsBase.Continuous, Float64}, Ts::Float64, method::Symbol; w_prewarp::Int64)
    @ ControlSystemsBase ~/.julia/packages/ControlSystemsBase/qOZW0/src/discrete.jl:81
 [13] c2d_x0map
    @ ~/.julia/packages/ControlSystemsBase/qOZW0/src/discrete.jl:43 [inlined]
 [14] c2d(sys::StateSpace{ControlSystemsBase.Continuous, Float64}, Ts::Float64, method::Symbol)
    @ ControlSystemsBase ~/.julia/packages/ControlSystemsBase/qOZW0/src/discrete.jl:32
 [15] setup_filter(p::Vector{Float64})
    @ Main ~/dev/warmteplanner-model/scripts/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X62sdnNjb2RlLXJlbW90ZQ==.jl:39
 [16] cost(plog::Vector{Float64})
    @ Main ~/dev/warmteplanner-model/scripts/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X65sdnNjb2RlLXJlbW90ZQ==.jl:16
 [17] value!!(obj::OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}, x::Vector{Float64})
    @ NLSolversBase ~/.julia/packages/NLSolversBase/n7XXO/src/interface.jl:9
 [18] value!(obj::OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}, x::Vector{Float64})
    @ NLSolversBase ~/.julia/packages/NLSolversBase/n7XXO/src/interface.jl:28
 [19] value!(obj::Optim.ManifoldObjective{OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}}, x::Vector{Float64})
    @ Optim ~/.julia/packages/Optim/7krni/src/Manifolds.jl:31
 [20] (::LineSearches.var"#Ļ•#1"{Optim.ManifoldObjective{OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}}, Vector{Float64}, Vector{Float64}, Vector{Float64}})(α::Float64)
    @ LineSearches ~/.julia/packages/LineSearches/b4CwT/src/LineSearches.jl:24
 [21] (::BackTracking{Float64, Int64})(Ļ•::LineSearches.var"#Ļ•#1"{Optim.ManifoldObjective{OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}}, Vector{Float64}, Vector{Float64}, Vector{Float64}}, αinitial::Float64, Ļ•_0::Float64, dĻ•_0::Float64)
    @ LineSearches ~/.julia/packages/LineSearches/b4CwT/src/backtracking.jl:63
 [22] BackTracking
    @ ~/.julia/packages/LineSearches/b4CwT/src/backtracking.jl:34 [inlined]
 [23] BackTracking
    @ ~/.julia/packages/LineSearches/b4CwT/src/backtracking.jl:24 [inlined]
 [24] perform_linesearch!(state::Optim.BFGSState{Vector{Float64}, Matrix{Float64}, Float64, Vector{Float64}}, method::BFGS{InitialStatic{Float64}, BackTracking{Float64, Int64}, Nothing, Nothing, Flat}, d::Optim.ManifoldObjective{OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}})
    @ Optim ~/.julia/packages/Optim/7krni/src/utilities/perform_linesearch.jl:58
 [25] update_state!(d::OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}, state::Optim.BFGSState{Vector{Float64}, Matrix{Float64}, Float64, Vector{Float64}}, method::BFGS{InitialStatic{Float64}, BackTracking{Float64, Int64}, Nothing, Nothing, Flat})
    @ Optim ~/.julia/packages/Optim/7krni/src/multivariate/solvers/first_order/bfgs.jl:143
 [26] optimize(d::OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}, initial_x::Vector{Float64}, method::BFGS{InitialStatic{Float64}, BackTracking{Float64, Int64}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}, state::Optim.BFGSState{Vector{Float64}, Matrix{Float64}, Float64, Vector{Float64}})
    @ Optim ~/.julia/packages/Optim/7krni/src/multivariate/optimize/optimize.jl:65
 [27] optimize
    @ ~/.julia/packages/Optim/7krni/src/multivariate/optimize/optimize.jl:43 [inlined]
 [28] #optimize#77
    @ ~/.julia/packages/Optim/7krni/src/multivariate/optimize/interface.jl:274 [inlined]
 [29] top-level scope
    @ ~/dev/warmteplanner-model/scripts/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X65sdnNjb2RlLXJlbW90ZQ==.jl:29

I have also tried :fweuler. That does work and converges. However the estimate is somewhat nonsensical.

I think that is related to the issue with the cost function here: QuanserInterface.jl/examples/pendulum_identification.jl at main Ā· JuliaComputing/QuanserInterface.jl

The line search can sometimes take too long steps, testing parameters that are no good. I sometimes have luck with

using Optim, Optim.LineSearches
...
BFGS(alphaguess = LineSearches.InitialStatic(alpha=0.1))

or some other small value.

You could move the setup_filter also inside the try-catch to guard against stuff like that

Thanks, I can try that.

If I want to optimize for 24 hour simulation performance, I guess I can’t use LowLevelParticleFilters.loglik because it will use all the measured data while filtering as well?

Optimization of a several-step prediction objective is rather expensive, you would essentially have to implement a manual loop that performs the filtering, and at each time step performs 24h worth of prediction to yield the predicted value to compare to data in the cost function. In other words, the O(N_{data}) complexity of one-step prediction minimization turns in to O(N_{data}N_{pred}). This is certainly not impossible, and if you don’t have too fast sampling it might not be very slow either

I think this method is described here already, if that is what you are referring to? Identification of unstable systems Ā· ControlSystemIdentification Documentation

The commonly used sampling rate in literature is only 15 minutes. I have even seen papers that use 30 minutes or even 60 minutes. Should be feasible.

No not quite. This page illustrates problems with simulation of unstable systems, but your system is probably not unstable so it’s not relevant. Pure simulation-error minimization also have O(N_{data}) complexity, it’s just the combination of filtering and simulation (multi-step prediction) that happens to be expensive to compute.

I think I got confused somewhere along the way. I don’t think filtering is important when optimizing for simulation performance. When you’re simulating 24 hours into the future, you don’t have access to any measured outputs nor any measured disturbances besides some forecasts like irradiance and ambient temperature. I think that ā€˜tuning the covariance matrices’ only makes sense if you’re continuously filtering? I think I’m a bit lost :smiley:

If you have a data set that spans much more than 24 hours you need to filter up until the point where each 24hrs simulation starts. if you’re familiar with model-predictive control, think of the filtering as whatever is leading up to the ā€œcurrent point in timeā€, and the simulation into the future as the simulation performed in the optimizer to optimize the future performance over the prediction horizon. It would be easier to explain this with a whiteboard, but I hope you get the picture, at each time point along the trajectory you start a 24hr simulation from the current filter estimate.

Simulation in this context is the same as repeatedly running the prediction step in the Kalman filter, without running the correction step where the measurements are incorporated.

1 Like

Thank you for taking the time to explain that, I get it now.

Yes exactly. I think it’s kind of like this:

Now in theory you could build one big optimization problem that is a function of both model parameters p and process and measurement noise covariance matrices R_1 and R_2. However like you said, the computational complexity of this would be quite large. I can’t recall i’ve ever seen this approach before either.

Like I mentioned the main issue with the data is the large, unmeasured disturbances.

So I was looking at an approach like this: System identification for building thermal systems under the presence of unmeasured disturbances in closed loop operation: Lumped disturbance modeling approach - ScienceDirect

If you have a minute to look at this paper (it’s straightforward I think) I would be curious to hear your opinion. My concern with this method is that you can give the optimizer such a large amount of freedom that it can essentially make up any disturbance such that the model will fit the data.

I’ve done some building identification myself with the ControlSystemIdentification.jl package before. From personal experience a major hurdle was the orders of magnitude difference between nominal parameter values (resistances in order of 1e-2 K/W and capacitances in order of 1e6-1e8 J/K). You can easily account for this by doing something like below when setting up the Kalman filter, which should dramatically improve convergence.

R_ia = p[1] * 1e-2
C_i = p[2] * 1e6
...

Also from personal experience, RC models sometimes have difficulties converging just because they don’t fit the data well. The black-box method using newpem from ControlSystemIdentification.jl has drastically improved my fits at the cost of interpretability.

1 Like

Thanks, that’s useful information. In my model I have defined capacitances in kWh/C and resistances in C/kW. The parameter estimates are somewhat on the same order of magnitude:

Yes, I have seen the same thing. However prediction error is also significantly easier to optimize than the simulation performance. The models I found that were accurate in PEM were not accurate in simulation of longer horizons (24 hours).

ControlSystemIdentification has an option to set the prediction horizon to infinity, which basically optimises the open loop simulation error (keyword argument focus=:simulation). For a finite horizon, the model adds extra states to the system to account for the extra timesteps to predict, but this is not the case when optimising open loop performance. You can still afterwards fit a Kalman filter that to estimate the state of your fitted model.

This is at least how I handle my fits. I do work with simulated ā€œmeasurementā€ data which includes pretty complete information of the disturbances, so probably your use case is a bit more difficult.

If I look at the implementation of newpem: ControlSystemIdentification.jl/src/pem.jl at b30bfbe4544836eae169695e939047af405d6757 Ā· baggepinnen/ControlSystemIdentification.jl

So h is the parameter in ControlSystems.jl/lib/ControlSystemsBase/src/matrix_comps.jl at 349e217c8227fc9208f7a86ca74b0701b6b24505 Ā· JuliaControl/ControlSystems.jl

Which says.

If sys is discrete, the prediction horizon h may be specified, in which case measurements up to and including time t-h and inputs up to and including time t are used to predict y(t).

To me this sounds like h controls how many measurements are included to estimate the current measurement. While I am interested in the opposite, I want to predict multiple outputs from what is essentially only one measurement (the initial condition). But maybe PredictionStateSpace is doing some magic that I dont understand.

To me that reads as being the prediction for y(t) given [y(t-h), y(t-h-1)...] and [u(t), u(t-1)... u(t-h), u(t-h-1)...], which should be exactly what you need. However, as @baggepinnen has mentioned that tends to be pretty hard for longer horizons, which is why I recommend the focus = :simulation setting which will just give you a fit that optimises the open loop simulation performance.

I am trying to estimate the state-space parameters such that I can estimate y(t+1), y(t+2), ..., y(t+k) for up to 24 hours into the future. So the word ā€˜measurements’ in this case doesn’t make much sense to me, because there are no measurements in the future (besides my forecasted disturbances)

I think the main problem is that it would give too much flexibility to the optimizer, all of that is unlikely observable. For example, the Kalman filter is invariant to a scaling of R_1, R_2 with the same scale factor.

As mentioned above, the argument h to newpem allows you to optimize the h-step prediction error. When this prediction is formed, it does not use the last h output samples, but it does use the input samples. This h-step prediction error compares only this h-step prediction to the available data, not the entire prediction trajectory up to the hth step, which it sounds like you want.

You still have a large dataset comprised of measurements, do you not? And when you compute a cost function, you compare the output of your filter or simulator to measurements from this dataset. Even if the prediction \hat{y}(t+24|t) is formed including measurements up to time t only, you still need a measurement at time t+24 from the data set to compare the prediction error e = y(t+24) - \hat{y}(t+24|t)?

1 Like