Covariance Optimization for Joint estimation using UKF

Hello, I am using an Unscented Kalman Filter to perform a joint state and parameter estimation.
I am trying to improve the accuracy of the parameters estimation foremost ( fit to the measured data is less important) .

Filter design

My dynamical model is a nonlinear, pseudo first order system with 7 first order states v_i and 7 “derivative” states z_i, such that \dot{v_i}(t) = z_i is trivial, the \dot{z_i} derivatives are nonlinear .

x= [v_1, \dots v_7 ,z_1, \dots, z_7]^T

Additionally i am trying to estimate three gain parameters. \theta= [A,B,C]^T , so my augmented state vector is x_\theta = [x; \theta]^T . Unfortunatley, I can only indirectly observe the first three states; my observation function is g(x(t)) = x_1(t) - x_2(t) -x_3(t)
The filter matches the measured signal nicely, but ultimately I am more interested in estimation of the time changing parameters \theta(t) - ideally in a physical meaningful range \ge 0 .

Question:

how to optimize the UKF to improve parameter estimation accuracy?

Attempt

At the moment I am trying to somehow optimize the Covariance Matrices Q and R of the UKF by calculating cost as log-likelihood based on state prediction errors (i know the true states from simulation) and minimize using a particle swarm alg:
My setup function and cost function are implemented as follows

using LowLevelParticleFilters, Optim

function setup_filter(Q,R)
  return UnscentedKalmanFilter(
    discrete_dynamics_inv, measurement,
    Q, R, MvNormal(x0, Q);
    ny = 1, nu = 1, Ts = ts,
    weight_params = kfpar
  ) 
end

function cost(pars)
  try  # to avoid PosDefException exit
    σ1 = pars[1:17]
    σ2 = [pars[18]]
    
    Q = SMatrix{17,17}(Diagonal(σ1))   # create cov mats
    R = SMatrix{1,1}(Diagonal(σ2)) 

    filter = setup_filter(Q, R)  
    ll = loglik_x(filter, u, y, x_tr) # y , x_tr from long simulation ts 
                                      # where theta changes gradually
    return -ll # ?minimization  
  catch e
     return eltype(pars)(Inf)
    end
end

I then create an initial guess (based on literature), and optimize

σₘ = 1e-2 # observation noise 
α₁ = 1e-3 # parameter base noise
α₂ = 1e-7 # states noise
α₃ = 1.0  # state with input u noise 
A = 5.0; B = 22; C=15 # ~ iv params
p_guess = [α₂,α₂,α₂,α₂,α₂,α₂,α₂,α₃,α₂,α₂,α₂,α₂,α₂,α₂,α₁*A^2,α₁*B^2,α₁*C^2, σₘ]

res = Optim.optimize(
    cost,
    p_guess,
    ParticleSwarm( n_particles = 30),  
    Optim.Options(
        show_trace        = true,
        show_every        = 1,
        iterations        = 10,
        time_limit        = 30,
    )
)

My Issue is that i am totally unsure if this is even a good approach to improve parameter estimation accuracy, or completely foolish and if i should rather aim to either :

  • optimize the other kalman filter parameters (s.a. sigmapoint spread ) instead of Q \& R?
  • calculate log-likelihood only based on parameters A,B,C not on x_\theta?
  • use a different optimization strategy instead of Particle Swarm?

I read some examples given in LowLevelParticleFilters.jl regarding optimization, but since im not an engineer i am very unsure if those translate to my problem at all. Given that covariance optimization seems to be a crucial step in KF tuning i wonder if there’s a best practice / gold standard to this that i should try first? On a side note i wonder if there’s a good/ proven way to constrain the parameters during KF estimation so they are not becoming negative, maybe by using a log transform. Anyways i hope, that someone with expertise in this domain is willing to point me into the right direction; Thanks in advance!

This is a reasonable approach. I usually optimize over the standard deviation and square this to get an always positive variance. This works also for full covariance matrices, where you optimize over the triangular part of the Cholesky factor to get an always positive definite covariance matrix.

I would not bother with this until everything else is tuned.

Easy enough to try, LBFGS() is often a good option.

The covariance of the measurement noise can often be measured. This will at least give a good starting point. It’s not always the case that the “true” covariance of the measurements gives the best estimation result, due to unforeseen model errors etc., but it should be a good staring point at least.


See also a blog post I wrote a while ago

it goes into a bit more on how to tune your model to accomodate disturbances etc. It also goes into how the noise driving z and the noise driving v in your model are not independent, you cannot disturb the position without changing the velocity (if you accept the position/speed analogy). This means that you can likely reduce the number of parameters you optimize over and make the optimization problem easier.

I’m not familiar with the term pseudo first order, but I would call this a second-order system, the state dimension required is 2 per component (v_i, z_i)

1 Like

Have you verified that the system is observable? If it is not, you can not hope to estimate the parameters. You are searching for 18 parameters given a single observation, which might be hoping for a bit too much, but analysis will tell.

You can perform linearized observability analysis using the filter you already have by calling

using ControlSystemsBase
ControlSystemsBase.observability(filter, x, u, p, t)

or you could try to make use of GitHub - SciML/StructuralIdentifiability.jl: Fast and automatic structural identifiability software for ODE systems for a tool that handles polynomial and rational equations. Another simple way to assess the practical observability is to monitor the estimated covariance matrix of the state, if this appears to “blow up”, the system is no observable using the measurements you have access to. You can plot this using

sol = forward_trajectory(filter, u, y)
plot(sol, plotx=false, plotxt=true, plotRt = true, plotu=false, ploty=false, plotyh=false)

First of all : thank you for the prompt response, and for the amazing work on LowLevelParticleFilters and the other packages! Further thank you for the clarifications, this provides me some reassurance and also direction for the next steps.
I read the blogpost and have some followup questions about that if you don’t mind:

The noise driving z and the noise driving v in your model are not independent

I completely agree with that notion; I would assume that also means, before optimization I should re- design the process Covariance matrix Q (or R_1)?
In such a way that its not a Diagonal Matrix but include off diagonal entries representing these z_i : v_i correlations?

(Further my model system is also a system of coupled DEs in the sense that states noted v, as well as “parameters” A,B,C also appear in the “derivative states” noted z, potentially leading to more off-diagonal entries?!.)

Regarding this I wonder how the Symmetric function ensures positive definitivity in your transmission example? It seems to change R1 just by the tiniest amount ?

R1d = σf^2 * Bwfd * Bwfd' + σs^2 * Bwsd * Bwsd' + 1e-8I #|> Symmetric # Cov
issymmetric(R1d)  # false :(
isposdef(R1d)       # false :((
R1ds = R1d |> Symmetric # magic!
issymmetric(R1ds)  # true :)
isposdef(R1ds)       # true :))
R1ds .- R1d |> maximum
#8.673617379884035e-19

On the other hand the deterministic relationship \dot{v}_i =z_i also indicates that each i has a single free parameter only → this is then relevant in the optimization part?

I.e. in the cost function something along the lines of either

1: one \alpha per second order pair?

α₁,α₂,α₃,α₄,α₅,α₆,α₇,β= pars[1:8]
Q = SMatrix{17,17}(Diagonal(α₁,α₂,α₃,α₄,α₅,α₆,α₇, α₁,α₂,α₃,α₄,α₅,α₆,α₇, β*A,β*B,β*G ))    ) 

2: one common base \sigma for all second order pairs?

σ₁,β= pars[1:2]
Q = SMatrix{17,17}(Diagonal(σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁, β*A,β*B,β*G ))  

3: or even keeping the v s fixed?

σ₁,β= pars[1:2]
Q = SMatrix{17,17}(Diagonal(1e-8,1e-8,1e-8,1e-8,1e-8,1e-8,1e-8,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁,σ₁, β*A,β*B,β*G ))  

probably a combination of 1 and 3 (v s fixed, seven optimization parameters for \sigma_i z_i would make the most sense in my case .

I think In the blog post you use a similar approach as the third one by multiplying a base \sigma with a boolean matrix :

\sigma*\begin{bmatrix} 0 & 0 \\ 0& 1 \end{bmatrix}

just that the double integrator has only one component, so the translation of this principle to a larger coupled system of ODEs is not immidiatley apparent to me.

I would call this a second-order system, the state dimension required is 2 per component

Yes i got things mixed up from kinetics rate law [B] >> [A] (; but you understood anyways; its a system that was lowered in order by introduction of a variable per component )

Yes. You could learn the best-fitting correlation, or you could assume that the correlation that you’d get in a double-integrator with continuous noise input holds, that is, that you have a block entry of
\sigma_i^2 \begin{pmatrix} \frac{T_s^3}{3} & \frac{T_s^2}{2} \\ \frac{T_s^2}{2} & T_s \end{pmatrix}
for each z_i,v_i component. Whether or not the \sigma_i are unique for each component, or you use a single \sigma for all of them, depends on the specifics.

It depends on what you assume about the evolution of the parameters A,B,C. If they are constant but changing at random times, or if they just slowly drift randomly, there does not have to be any additional correlations. On the other hand, if they change due to changes in z,v, maybe there “should be”. I say “should be” since figuring out what number to attribute such a correlation might be hard, and the performance gain for finding the right one might be small.

Correct, the parameter in that case is the \sigma_i from above.

The blog post effectively uses number 1, the
\sigma^2 \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}
is for a continuous-time process, which when discretized appears as either the expression above, or the very similar-looking one in the blog when assuming piece-wise constant noise inputs.