Statistics... cov(): Hermitian?

BLI · April 28, 2019, 4:12pm

I’m using the Statistics package with the cov() function to compute covariance of unscented mappings in sigma points (Unscented Kalman Filter). After a couple of iterations, the filter crashes with some message about problems with Cholesky decomposition… which I use to compute the sigma points. So I am debugging…

I assume that the Cholesky problem is caused by the a posteriori covariance matrix incorrectly becoming negative definite.

In the debugging phase…

Question: is the cov() function guaranteed to return a hermitian matrix?

Tamas_Papp · April 29, 2019, 5:29am

mathematically, yes,
numerically, yes (AFAIK),
in terms of types, no, it returns a Matrix, not a LinearAlgebra.Hermitian or Symmetric. You may want to wrap your result in on of these.

Also, note that cholesky will fail for not strictly pd matrices (zero eigenvalues), there is an option to suppress this.

Finally, if you are coding UKF, presumably you are working from a paper/text that discusses all the numerical issues. If not, you should look for one, there are plenty.

tlienart · April 29, 2019, 6:04am

This does not directly answer your question but when taking the covariance in contexts where there may be numerical problems either due to dimensionality or otherwise, techniques such as those implemented in CovarianceEstimation.jl should help (where the produced covariance is a convex combination of the sample covariance and a prior matrix like the identity).

In the UKF context though if your a posteriori covariance is near-deficient, that may be indicative of problems with your model itself, for instance if your estimation of the covariance of the state is also near-deficient and/or if the non-linearity is “too” non-linear (iirc).

BLI · April 29, 2019, 6:57am

Thanks, Tamas. The UKF struggles… I use the standard 2nx version, but will try the 2nx+1 version.

The puzzling thing is that the Ensemble KF works without a problem; from past experience, the UKF has worked more smoothly.

BLI · April 29, 2019, 6:57am

Thanks for suggestion. I’ll look back at it if I cannot make the UKF work. Note though, the model is “simple”, so it should work.

tlienart · April 29, 2019, 10:17am

Yeah if the EKF works fine the UKF should work even better …

BLI · April 29, 2019, 3:37pm

Argh. The problem was that Julia is much more sensitive to symmetry in matrices than what I’m used to in MATLAB. I simply had to make the covariance matrix symmetric (hermitian) before I did the Cholesky factorization.

In any way, I also implemented the 2nx version, the 2nx+1 version, and the UKF with assuming linear noise and nonlinear noise. Sort of fun.

BLI · April 29, 2019, 10:07pm

Suppose a covariance matrix X becomes indefinite due to numerics, e.g, Cholesky decomposition X = L*L’ doesn’t exist. One solution is to find the smallest eigenvalue e_min, e_min < 0, and repair X by replacing it with X + |e_min|*I. Probably not efficient.

Possible alternative? Find X = LDL’ where D is diagonal. Then find d_min, smallest diagonal element of D, and if d_min < 0, “repair” X by replacing it with X + |d_min|LL’.

Ok… I don’t know whether Julia has such a LDL’ factorization algorithm…

I did find a similar Bunch-Kaufman algorithm, however. But the BK algorithm may have:

permutation matrix P, i.e., X = PLDL’P’-- slightly inconvenient for doing Cholesky of repaired X,
D may have either 1x1 or 2x2 blocks. I don’t quite see why the 2x2 block is needed? Numeric reason?

Anyway, if someone knows when the BK algorithm gives 2x2 blocks, I’m curious…

Orbots · April 29, 2019, 10:21pm

Quick way to get around julia’s strict tolerance setting for cholesky factorization is to force it to use the lower triangular part of your matrix. It’s very easy for floating point error to cause a symmetric matrix to drift out of the tolerance for factorize

LLt = cholesky(Symmetric(A, :L))

Tamas_Papp · April 30, 2019, 6:34am

AFAIK the best solution is just using the SVD. I have not worked with the UKF, but for the KF

@inproceedings{wang1992kalman,
  title={Kalman filter algorithm based on singular value decomposition},
  author={Wang, Liang and Libert, Ga{\"e}tan and Manneback, Pierre},
  booktitle={Decision and Control, 1992., Proceedings of the 31st IEEE Conference on},
  pages={1224--1229},
  year={1992},
  organization={IEEE}
}

@inproceedings{zhang1996fixed,
  title={Fixed-interval smoothing algorithm based on singular value decomposition},
  author={Zhang, Youmin and Li, XR},
  booktitle={Control Applications, 1996., Proceedings of the 1996 IEEE International Conference on},
  pages={916--921},
  year={1996},
  organization={IEEE}
}

may be a starting point.

BLI · April 30, 2019, 7:05am

Thanks. This reminded me of a book that was popular in the circuit early on:

G. J. Bierman, Factorization Methods for Discrete Sequential Estimation, Academic, New York, 1977

I discussed the need for square root filters with some people ca. 2010 (missile industry, which is outside the scope of my normal interests). My understanding from that conversation was that square root filters might have been important in the “old time” with limited word length processors, but was less important now (= 2010).

Of course, missiles are special in many ways: (i) relatively low order models, (ii) relatively short “life expectancy”, so long term (days, months) accuracy may not be critical.

However, I see that square root filters may be useful with larger models and for longer time periods. And maybe as a way to avoid “my” problem.

BLI · April 30, 2019, 7:06am

Yes, probably faster than Hermitian – and perhaps of sufficient accuracy.

BLI · April 30, 2019, 7:12am

OK – I took a peek. Looks interesting. However, the UKF is special in the sense that it uses different weighs on the “feature” instances – I’m not sure how easy it is to adapt the CovarianceEstimation algorithm. Of course, with 2nx+1 sigma points, 2nx of them use the same weight, so that is straightforward. But may the single feature instance with different weight mess up things?

Tamas_Papp · April 30, 2019, 7:26am

My understanding is that problems also became more complex, and in some applications (eg economics) ill-conditioning is common because models are not a good match for the data generating proess, so extra accuracy and robustness is always good.

I would skip the square root filtering and go straight to the SVD.

tlienart · April 30, 2019, 7:30am

You might not end up using the package directly but rather the idea of doing a convex combination between S_w and F where S_w is the covariance matrix that you have currently computed (with weights, using cov) and F is an appropriate prior covariance matrix (the trivial case being the identity, a better one is the DiagonalCommonVariance one which is the identity scaled by by the trace of S_w divided by p (see the package doc)).

As a result you might not have an optimal shrinkage parameter anymore but you could get a decent one with cross validation. (see also the shrunk covariance estimator in sklearn)

Ultimately there’s an element of computational complexity too. If you have small dimensionality (e.g. less than 100), the SVD is probably the easiest option (though shrinkage can be seen as a “regularisation tool” here which may help in your case). However if the dimensionality is high then there would be a significant advantage in using something like a simple linear shrinkage estimator which is much faster.

Topic		Replies	Views
Is this a bug with Cholesky? General Usage linearalgebra	6	4764	June 1, 2019
Cholesky factorization for slightly non-Hermitian matrices Numerics linearalgebra	18	4085	January 28, 2019
Square root of the inverse of a matrix Numerics linearalgebra	17	5527	June 19, 2020
What is the best way to factorize/decompose a covariance matrix? Numerics question , linearalgebra	21	4339	March 9, 2020
In-place cholesky decomposition fails even if the matrix is hermitian General Usage question	2	442	July 11, 2023

Statistics... cov(): Hermitian?

Related topics