Numerical inaccuracies when constructing correlation matrices

The basic issue is that trying to make a matrix exactly semidefinite is inherently nonrobust to roundoff errors. Different ways of computing the eigenvalues will give slightly different values due to differences in floating-point effects, so even if you make the eigenvalues \ge 0 for one computation you might get slightly negative eigenvalues from another computation/algorithm with the same matrix.

The question is, what are you trying to do with the matrix S2 that requires it to be strictly semidefinite? It is generally better to try to fix your subsequent computation to be tolerant of tiny negative eigenvalues in such cases. For example, sqrt(Symmetric(A)) includes a backwards-stable correction to return a real matrix square root for nearly semidefinite matrices (see julia#35057).

Alternatively, if you don’t care too much about accuracy, you can just do something like S2 + 1e-8*norm(S2)*I, of course (sacrificing half of the significant digits).

You should also consider the possibility that you have a bug in your code, or are mis-using the algorithm, or that the source you are using is incorrect. Your linked article claims that the algorithm generates “full-rank” correlation matrices, but your S2 matrix is extremely rank deficient: it is a 1000x1000 matrix with numerical rank (number of non-negligible eigenvalues) of only 64 . Is this really what you want?

3 Likes