I’m not convinced that a Kalman filter is appropriate in this case (see below), but here goes. As @jlperla
said, the first step is to write down a process model as a discrete time linear state space model with Gaussian process noise, i.e. something of the form
\mathbf{x}[k] = \mathbf{A} \mathbf{x}[k -1] + \mathbf{B} \mathbf{u}[k] + \mathbf{w}[k]
(assuming a time-invariant model, \mathbf{A} and \mathbf{B} don’t change in time), with \mathbf{x}[k] the state at time k, \mathbf{u} the input, and \mathbf{w} the process noise, distributed as \mathcal{N}(0, \mathbf{Q}). The zero mean is important for the theoretical guarantees of a Kalman filter to hold, but you’ll still get some kind of filtering if that assumption doesn’t hold.
In your case, you could use a first-order model. With a first-order model, following @Tamas_Papp
’s comment (Help implement Kalman filter - #2 by Tamas_Papp), you could have \mathbf{x} = \left( \begin{smallmatrix} x \\ y \end{smallmatrix} \right), the x- and y-coordinates of the tip.
The process model could simply be
\mathbf{x}[k] = \mathbf{x}[k - 1] + \mathbf{w}[k]
i.e., \mathbf{A} = \mathbf{I}, \mathbf{B} = 0, so that the next state is just the previous state with some noise added to it (a Gaussian random walk). This model formalizes how you expect the tip to behave. Now, given the videos you showed, that doesn’t seem quite right, because you don’t really expect to the tip to move upward as often as it moves downward (which messes with the assumption of zero mean Gaussian noise), so you could consider the following model instead:
\mathbf{x}[k] = \mathbf{x}[k -1] + \begin{bmatrix}
0 \\
\alpha
\end{bmatrix}
\mathbf{u} [k] + \mathbf{w}[k]
where \alpha is some negative constant and \mathbf{u}[k] = (g), the magnitude of gravitational acceleration.
Even that doesn’t seem quite right, because it seems like when the tip is going right, it’s more likely that it’ll keep going right. So you could use a second order model (with velocities/changes in x and maybe y as additional states), but I won’t go into that right now. Again, it matters for the theoretical guarantees of the filter. It also matters for the performance in practice, but it probably won’t matter that much in this case.
Next, you need a measurement model, that is, a model of the form
\mathbf{y}[k] = \mathbf{C} \mathbf{x}[k] + \mathbf{v}[k]
where \mathbf{y} is the measurement (which is supposed to give some information about the current state), and \mathbf{v}, the measurement noise, is distributed as \mathcal{N}(0, \mathbf{R}). In your case, you’ve got direct measurements of both x and y from the images. You’d probably just find the pixel that is brightest within some area, and call its location your measurement. Honestly, that is probably such a good measurement of the tip location that you don’t even need a Kalman filter anymore, but I’ll carry on anyway. Your measurement model will be simply
\mathbf{y}[k] = \mathbf{I} \mathbf{x}[k] + \mathbf{v}[k]
Now you need to choose the covariances of the process noise (\mathbf{Q}) and measurement noise (\mathbf{R}). ‘Officially’, those should come from e.g. the specs of your sensor or a system identification procedure, but practically, they’re often just used as knobs that you can tweak. They’re measures of how uncertain you are of the process model and measurement model that you wrote down. If your measurement noise covariance is small and your process noise covariance is large, then the Kalman filter will predominantly follow your measurements without doing much smoothing. Vice versa, if the process noise covariance is small and the measurement noise covariance is large, then the Kalman filter will ‘trust’ the measurements less and smooth them more based on the process model.
Once you have \mathbf{A}, \mathbf{B}, \mathbf{Q}, \mathbf{C}, and \mathbf{R}, as well as \mathbf{u}[i] and \mathbf{y}[i] for all times i that are in the past, and an estimate of \mathbf{x}[0] and the initial state covariance to get things started, you can just ‘turn the crank’ as it were, and use the standard Kalman filter equations to get your ‘optimal’ estimate of \mathbf{x}[k] (optimal assuming that your process model and measurement model are accurate).
Actually, if you wanted to, you could probably do better than Kalman filtering. I’m assuming you probably don’t have to do this in real time. That is, you can collect tip location data for all the time instances you’re interested in, and determine a smoothed tip location trajectory afterwards. The setup of Kalman filtering assumes that you only have access to data from the past, which means that you necessarily introduce some lag between the estimated tip location and the actual tip location as you’re filtering. But since you know tip locations for all times, you can actually do Kalman smoothing instead of filtering by using data from both the past and the future at any given time instant, and avoid introducing lag.
But again, all of this probably doesn’t matter since it seems like you get such clean data from your images that Kalman filtering isn’t really needed (in the Kalman filter terminology, \mathbf{R} is really small compared to \mathbf{Q}).