Statistics... cov(): Hermitian?

I’m using the Statistics package with the cov() function to compute covariance of unscented mappings in sigma points (Unscented Kalman Filter). After a couple of iterations, the filter crashes with some message about problems with Cholesky decomposition… which I use to compute the sigma points. So I am debugging…

I assume that the Cholesky problem is caused by the a posteriori covariance matrix incorrectly becoming negative definite.

In the debugging phase…

Question: is the cov() function guaranteed to return a hermitian matrix?

  1. mathematically, yes,

  2. numerically, yes (AFAIK),

  3. in terms of types, no, it returns a Matrix, not a LinearAlgebra.Hermitian or Symmetric. You may want to wrap your result in on of these.

Also, note that cholesky will fail for not strictly pd matrices (zero eigenvalues), there is an option to suppress this.

Finally, if you are coding UKF, presumably you are working from a paper/text that discusses all the numerical issues. If not, you should look for one, there are plenty.

1 Like

This does not directly answer your question but when taking the covariance in contexts where there may be numerical problems either due to dimensionality or otherwise, techniques such as those implemented in CovarianceEstimation.jl should help (where the produced covariance is a convex combination of the sample covariance and a prior matrix like the identity).

In the UKF context though if your a posteriori covariance is near-deficient, that may be indicative of problems with your model itself, for instance if your estimation of the covariance of the state is also near-deficient and/or if the non-linearity is “too” non-linear (iirc).


Thanks, Tamas. The UKF struggles… I use the standard 2nx version, but will try the 2nx+1 version.

The puzzling thing is that the Ensemble KF works without a problem; from past experience, the UKF has worked more smoothly.

Thanks for suggestion. I’ll look back at it if I cannot make the UKF work. Note though, the model is “simple”, so it should work.

1 Like

Yeah if the EKF works fine the UKF should work even better …

Argh. The problem was that Julia is much more sensitive to symmetry in matrices than what I’m used to in MATLAB. I simply had to make the covariance matrix symmetric (hermitian) before I did the Cholesky factorization.

In any way, I also implemented the 2nx version, the 2nx+1 version, and the UKF with assuming linear noise and nonlinear noise. Sort of fun.

Suppose a covariance matrix X becomes indefinite due to numerics, e.g, Cholesky decomposition X = L*L’ doesn’t exist. One solution is to find the smallest eigenvalue e_min, e_min < 0, and repair X by replacing it with X + |e_min|*I. Probably not efficient.

Possible alternative? Find X = LDL’ where D is diagonal. Then find d_min, smallest diagonal element of D, and if d_min < 0, “repair” X by replacing it with X + |d_min|LL’.

Ok… I don’t know whether Julia has such a LDL’ factorization algorithm…

I did find a similar Bunch-Kaufman algorithm, however. But the BK algorithm may have:

  • permutation matrix P, i.e., X = PLDL’P’-- slightly inconvenient for doing Cholesky of repaired X,
  • D may have either 1x1 or 2x2 blocks. I don’t quite see why the 2x2 block is needed? Numeric reason?

Anyway, if someone knows when the BK algorithm gives 2x2 blocks, I’m curious…

Quick way to get around julia’s strict tolerance setting for cholesky factorization is to force it to use the lower triangular part of your matrix. It’s very easy for floating point error to cause a symmetric matrix to drift out of the tolerance for factorize

LLt = cholesky(Symmetric(A, :L))
1 Like

AFAIK the best solution is just using the SVD. I have not worked with the UKF, but for the KF

  title={Kalman filter algorithm based on singular value decomposition},
  author={Wang, Liang and Libert, Ga{\"e}tan and Manneback, Pierre},
  booktitle={Decision and Control, 1992., Proceedings of the 31st IEEE Conference on},

  title={Fixed-interval smoothing algorithm based on singular value decomposition},
  author={Zhang, Youmin and Li, XR},
  booktitle={Control Applications, 1996., Proceedings of the 1996 IEEE International Conference on},

may be a starting point.

1 Like

Thanks. This reminded me of a book that was popular in the circuit early on:

G. J. Bierman, Factorization Methods for Discrete Sequential Estimation, Academic, New York, 1977

I discussed the need for square root filters with some people ca. 2010 (missile industry, which is outside the scope of my normal interests). My understanding from that conversation was that square root filters might have been important in the “old time” with limited word length processors, but was less important now (= 2010).

Of course, missiles are special in many ways: (i) relatively low order models, (ii) relatively short “life expectancy”, so long term (days, months) accuracy may not be critical.

However, I see that square root filters may be useful with larger models and for longer time periods. And maybe as a way to avoid “my” problem.

Yes, probably faster than Hermitian – and perhaps of sufficient accuracy.

OK – I took a peek. Looks interesting. However, the UKF is special in the sense that it uses different weighs on the “feature” instances – I’m not sure how easy it is to adapt the CovarianceEstimation algorithm. Of course, with 2nx+1 sigma points, 2nx of them use the same weight, so that is straightforward. But may the single feature instance with different weight mess up things?

My understanding is that problems also became more complex, and in some applications (eg economics) ill-conditioning is common because models are not a good match for the data generating proess, so extra accuracy and robustness is always good.

I would skip the square root filtering and go straight to the SVD.

1 Like

You might not end up using the package directly but rather the idea of doing a convex combination between S_w and F where S_w is the covariance matrix that you have currently computed (with weights, using cov) and F is an appropriate prior covariance matrix (the trivial case being the identity, a better one is the DiagonalCommonVariance one which is the identity scaled by by the trace of S_w divided by p (see the package doc)).

As a result you might not have an optimal shrinkage parameter anymore but you could get a decent one with cross validation. (see also the shrunk covariance estimator in sklearn)

Ultimately there’s an element of computational complexity too. If you have small dimensionality (e.g. less than 100), the SVD is probably the easiest option (though shrinkage can be seen as a “regularisation tool” here which may help in your case). However if the dimensionality is high then there would be a significant advantage in using something like a simple linear shrinkage estimator which is much faster.