How to Kalman filter

question

#1

I need to improve on estimated locations of a moving object using prior knowledge about its speed (speed changes are small) and angular speed (its direction doesn’t change much either). For background, this is a continuation from this post.

I read this excellent explanation about Kalman filters, and checked out the Kalman, StateSpace, DataAssim, and StateSpaceRoutines packages, but I’m not sure exactly how to implement any of the functionalities in those packages.

Has anyone had any experience with improving estimated locations of a moving object with a known range of speeds and angular speeds using a Kalman filter?

As always, thanks for any input!


Help implement Kalman filter
#2

For what it is worth: if you have a background in basic statistics, I always find the Kalman Filter much easier to understand, interpret, and derive thinking like a Bayesian with a hidden markov model. There are a million derivations along those lines, but this is as good as any: https://lectures.quantecon.org/jl/kalman.html

Can’t help you on the particular problem due to rusty mechanics, but your best bet is to start with the physics. The best approach is to assume you know everything and write down the math in a linear state space model, where the state vector to use should be clear if you get the physics right. If you get all of the matrices right for the evolution equation, then the observation equation just comes down to figuring out how the underlying state is measured (which is problem dependent).


#3

I’m not convinced that a Kalman filter is appropriate in this case (see below), but here goes. As @jlperla said, the first step is to write down a process model as a discrete time linear state space model with Gaussian process noise, i.e. something of the form

\mathbf{x}[k] = \mathbf{A} \mathbf{x}[k -1] + \mathbf{B} \mathbf{u}[k] + \mathbf{w}[k]

(assuming a time-invariant model, \mathbf{A} and \mathbf{B} don’t change in time), with \mathbf{x}[k] the state at time k, \mathbf{u} the input, and \mathbf{w} the process noise, distributed as \mathcal{N}(0, \mathbf{Q}). The zero mean is important for the theoretical guarantees of a Kalman filter to hold, but you’ll still get some kind of filtering if that assumption doesn’t hold.

In your case, you could use a first-order model. With a first-order model, following @Tamas_Papp's comment (Help implement Kalman filter), you could have \mathbf{x} = \left( \begin{smallmatrix} x \\ y \end{smallmatrix} \right), the x- and y-coordinates of the tip.

The process model could simply be

\mathbf{x}[k] = \mathbf{x}[k - 1] + \mathbf{w}[k]

i.e., \mathbf{A} = \mathbf{I}, \mathbf{B} = 0, so that the next state is just the previous state with some noise added to it (a Gaussian random walk). This model formalizes how you expect the tip to behave. Now, given the videos you showed, that doesn’t seem quite right, because you don’t really expect to the tip to move upward as often as it moves downward (which messes with the assumption of zero mean Gaussian noise), so you could consider the following model instead:

\mathbf{x}[k] = \mathbf{x}[k -1] + \begin{bmatrix} 0 \\ \alpha \end{bmatrix} \mathbf{u} [k] + \mathbf{w}[k]

where \alpha is some negative constant and \mathbf{u}[k] = (g), the magnitude of gravitational acceleration.

Even that doesn’t seem quite right, because it seems like when the tip is going right, it’s more likely that it’ll keep going right. So you could use a second order model (with velocities/changes in x and maybe y as additional states), but I won’t go into that right now. Again, it matters for the theoretical guarantees of the filter. It also matters for the performance in practice, but it probably won’t matter that much in this case.

Next, you need a measurement model, that is, a model of the form

\mathbf{y}[k] = \mathbf{C} \mathbf{x}[k] + \mathbf{v}[k]

where \mathbf{y} is the measurement (which is supposed to give some information about the current state), and \mathbf{v}, the measurement noise, is distributed as \mathcal{N}(0, \mathbf{R}). In your case, you’ve got direct measurements of both x and y from the images. You’d probably just find the pixel that is brightest within some area, and call its location your measurement. Honestly, that is probably such a good measurement of the tip location that you don’t even need a Kalman filter anymore, but I’ll carry on anyway. Your measurement model will be simply

\mathbf{y}[k] = \mathbf{I} \mathbf{x}[k] + \mathbf{v}[k]

Now you need to choose the covariances of the process noise (\mathbf{Q}) and measurement noise (\mathbf{R}). ‘Officially’, those should come from e.g. the specs of your sensor or a system identification procedure, but practically, they’re often just used as knobs that you can tweak. They’re measures of how uncertain you are of the process model and measurement model that you wrote down. If your measurement noise covariance is small and your process noise covariance is large, then the Kalman filter will predominantly follow your measurements without doing much smoothing. Vice versa, if the process noise covariance is small and the measurement noise covariance is large, then the Kalman filter will ‘trust’ the measurements less and smooth them more based on the process model.

Once you have \mathbf{A}, \mathbf{B}, \mathbf{Q}, \mathbf{C}, and \mathbf{R}, as well as \mathbf{u}[i] and \mathbf{y}[i] for all times i that are in the past, and an estimate of \mathbf{x}[0] and the initial state covariance to get things started, you can just ‘turn the crank’ as it were, and use the standard Kalman filter equations to get your ‘optimal’ estimate of \mathbf{x}[k] (optimal assuming that your process model and measurement model are accurate).

Actually, if you wanted to, you could probably do better than Kalman filtering. I’m assuming you probably don’t have to do this in real time. That is, you can collect tip location data for all the time instances you’re interested in, and determine a smoothed tip location trajectory afterwards. The setup of Kalman filtering assumes that you only have access to data from the past, which means that you necessarily introduce some lag between the estimated tip location and the actual tip location as you’re filtering. But since you know tip locations for all times, you can actually do Kalman smoothing instead of filtering by using data from both the past and the future at any given time instant, and avoid introducing lag.

But again, all of this probably doesn’t matter since it seems like you get such clean data from your images that Kalman filtering isn’t really needed (in the Kalman filter terminology, \mathbf{R} is really small compared to \mathbf{Q}).


#4

I agree with @tkoolen that the Kalman filter is probably overkill, and in case you still want to invest in it, the Kalman smoother adds little extra cost but a lot of benefits. I also think that, @tkoolen’s nice summary aside, it will be difficult to apply this methodology without basic understanding of how it works. Like @jlperla, I find the Bayesian derivation the most useful. I usually recommend the books of Harrison & West, eg this one. If I remember correctly, it starts with the random walk version.


#5

That is a nice writeup, @tkoolen . I also doubt that Kalman filtering here is the right thing as there is too little observation noise such that the error by model linearization dominates. But could we use this writeup and the test data to make an illustrative notebook using https://github.com/mschauer/Kalman.jl ? By now we have everything in place


#6

Yep, go right ahead.

Here’s the source for the post:

I'm not convinced that a Kalman filter is appropriate in this case (see below), but here goes. As `@jlperla` said, the first step is to write down a process model as a discrete time linear state space model with Gaussian process noise, i.e. something of the form
$$
\mathbf{x}[k] = \mathbf{A} \mathbf{x}[k -1] + \mathbf{B} \mathbf{u}[k] + \mathbf{w}[k]
$$
(assuming a time-invariant model, $\mathbf{A}$ and $\mathbf{B}$ don't change in time), with $\mathbf{x}[k]$ the state at time $k$, $\mathbf{u}$ the input, and $\mathbf{w}$ the process noise, distributed as $\mathcal{N}(0, \mathbf{Q})$. The zero mean is important for the theoretical guarantees of a Kalman filter to hold, but you'll still get some kind of filtering if that assumption doesn't hold.

In your case, you could use a first-order model. With a first-order model, following `@Tamas_Papp`'s comment (https://discourse.julialang.org/t/help-implement-kalman-filter/8718/2?u=tkoolen), you could have $\mathbf{x} = \left( \begin{smallmatrix} x \\ y \end{smallmatrix} \right)$, the $x$- and $y$-coordinates of the tip.

The process model could simply be
$$
\mathbf{x}[k] = \mathbf{x}[k - 1] + \mathbf{w}[k]
$$
i.e., $\mathbf{A} = \mathbf{I}$, $\mathbf{B} = 0$, so that the next state is just the previous state with some noise added to it (a Gaussian random walk). This model formalizes how you expect the tip to behave. Now, given the videos you showed, that doesn't seem quite right, because you don't really expect to the tip to move upward as often as it moves downward (which messes with the assumption of zero mean Gaussian noise), so you could consider the following model instead:
$$
\mathbf{x}[k] = \mathbf{x}[k -1]  + \begin{bmatrix}
0 \\
\alpha
\end{bmatrix}
\mathbf{u} [k] + \mathbf{w}[k]
$$
where $\alpha$ is some negative constant and $\mathbf{u}[k] = (g)$, the magnitude of gravitational acceleration.

Even that doesn't seem quite right, because it seems like when the tip is going right, it's more likely that it'll keep going right. So you could use a second order model (with velocities/changes in $x$ and maybe $y$ as additional states), but I won't go into that right now. Again, it matters for the theoretical guarantees of the filter. It also matters for the performance in practice, but it probably won't matter that much in this case.

Next, you need a measurement model, that is, a model of the form
$$
\mathbf{y}[k] = \mathbf{C} \mathbf{x}[k] + \mathbf{v}[k]
$$
where $\mathbf{y}$ is the measurement (which is supposed to give some information about the current state), and $\mathbf{v}$, the measurement noise, is distributed as $\mathcal{N}(0, \mathbf{R})$. In your case, you've got direct measurements of both $x$ and $y$ from the images. You'd probably just find the pixel that is brightest within some area, and call its location your measurement. Honestly, that is probably such a good measurement of the tip location that you don't even need a Kalman filter anymore, but I'll carry on anyway. Your measurement model will be simply
$$
\mathbf{y}[k] = \mathbf{I} \mathbf{x}[k] + \mathbf{v}[k]
$$

Now you need to choose the covariances of the process noise ($\mathbf{Q}$) and measurement noise ($\mathbf{R}$). 'Officially', those should come from e.g. the specs of your sensor or a system identification procedure, but practically, they're often just used as knobs that you can tweak. They're measures of how uncertain you are of the process model and measurement model that you wrote down. If your measurement noise covariance is small and your process noise covariance is large, then the Kalman filter will predominantly follow your measurements without doing much smoothing. Vice versa, if the process noise covariance is small and the measurement noise covariance is large, then the Kalman filter will 'trust' the measurements less and smooth them more based on the process model.

Once you have $\mathbf{A}$, $\mathbf{B}$, $\mathbf{Q}$, $\mathbf{C}$, and $\mathbf{R}$, as well as $\mathbf{u}[i]$ and $\mathbf{y}[i]$ for all times $i$ that are in the past, and an estimate of $\mathbf{x}[0]$ and the initial state covariance to get things started, you can just 'turn the crank' as it were, and use the standard Kalman filter equations to get your 'optimal' estimate of $\mathbf{x}[k]$ (optimal assuming that your process model and measurement model are accurate).

Actually, if you wanted to, you could probably do better than Kalman filtering. I'm assuming you probably don't have to do this in real time. That is, you can collect tip location data for all the time instances you're interested in, and determine a smoothed tip location trajectory afterwards. The setup of Kalman filtering assumes that you only have access to data from the past, which means that you necessarily introduce some lag between the estimated tip location and the actual tip location as you're filtering. But since you know tip locations for all times, you can actually do Kalman _smoothing_ instead of filtering by using data from both the past and the future at any given time instant, and avoid introducing lag.

But again, all of this probably doesn't matter since it seems like you get such clean data from your images that Kalman filtering isn't really needed (in the Kalman filter terminology, $\mathbf{R}$ is really small compared to $\mathbf{Q}$).

#7

Wow thanks a bunch @tkoolen and people! I’ll try to munch through your writeup on Monday. @mschauer, a notebook would be most useful! Keep me in the loop, I can contribute with (if nothing else) some “cool” real data to filter…


#8

Turns out that having a Kalman filter and a second order model may be worthwhile. Mostly because you can center the window at the predicted value instead of the current value, but also while the measurement error is small most of the time the error can be substantial if two roots come close to each other for example.

rootstrack2


#9

Dear Lord @mschauer! Do share the code?! :grinning:


#11

Have you seen my ping on slack?


#12

No. Checking!


#13

Kalman filters are numerically unstable and their use should be avoided. You should be using what is called a square-root filter which, according to the folklore, doubles the precision. The basic problem with the KF is that it involves numerically unstable ‘squaring’ operations. An initial lead on this topic can be found at https://en.wikipedia.org/wiki/Kalman_filter#Square_root_form.

I am not familiar with your area of application, but if there is no good reason for your system disturbances to be independent, you might consider using innovations state space models (Anderson & Moore, Optimal Filtering, 1979, Chapter 9). This opens up the possibility of using much simpler exponential smoothing methods - Hyndman et al, Forecasting with Exponential Smoothing: A State Space Approach, Springer, 2008.


#14

Good tip, but I would consider a Kalman filter in square root form to be a (good) way to implement a Kalman filter, as opposed to an entirely separate thing.


#15

This perhaps not important if d=2 and the noise matrix is not singular.


#16

Good tip, but I would consider a Kalman filter in square root form to be a
(good) way to implement a Kalman filter, as opposed to an entirely separate
thing.

There is little doubt that the Kalman filter can be adapted using something
like a Cholesky factorisation to ensure that truncation errors do not lead
to variance matrices which fail to be positive definite. However, my
understanding is that even greater accuracy can be achieved using QR
methods e.g. Paige, Christopher C and Saunders, MA (1977) ,SIAM Journal on
Numerical Analysis, 180-193


#17

My experience with the KF (limited to inference in DSGE models) is that when you are relying on the numerical advantage of QR (or even more sophisticated, SVD) vs Cholesky, then you are already in trouble.


#18

Does anyone here know by any chance the QuantEcon.jl package and the Kalman routines that are included there? (https://github.com/QuantEcon/QuantEcon.jl/blob/master/src/kalman.jl)

I would like to have two state variables, but only one variable that is observed. I know the underlying laws of motions for the states (i.e. I can assume them to differ), but somehow I am not sure how I am supposed to define the matrices in this case.

For now I do

A_b = [1.0 1.0]'
G_b = [1.0 0.0
       0.0 r/(1-par.ψ)]
Q_b = [par.σ_ε 0.0
       0.0 par.σ_ξ]
R_b = [par.σ_η]

kalman_bubbly = Kalman(A_b, G_b, Q_b, R_b)

But that does not seem the correct way to write it, since the updating routine throws an error. Does anyone have an idea how to define it correctly?