Issue: PDMats.jl matrices losing positive-definiteness with Optim.jl autodiff

You could also just use a change of variables to do nonlinear optimization over semidefinite matrices (which you have to include in “SPD” for optimization since the set of strictly positive-definite matrices is not closed). That is, instead of optimizing f(A) over “SPD” matrices A, you optimize f(X^T X) over unconstrained m \times m matrices X, or better yet f(Hermitian(X'X)) to ensure that it is exactly Hermitian despite roundoff errors.

This has no loss of generality, allows you to use any nonlinear optimization algorithm, and is easy. (The downside is that you might lose some special structure in f that would otherwise let you use something specialized like an SDP solver, but I’m guessing you don’t have a straightforward SDP or you would be using such a solver already.)

You can also, of course, use specialized methods for semidefinite constraints. And, of course, it depends a lot on what you want to do with the SPD matrix in your function. If you want to solve linear systems, for example, it would be more efficient to optimize over the vector space of lower- or upper-triangular matrices X, since then you have the Cholesky factorization for free.

Note, however, that X^T X may be slightly indefinite due to roundoff errors, so your f(X^T X) computation needs to be robust to such things. This is a general issue to be aware of when working with SDP matrices; how to ensure robustness depends on what precise calculation you’re performing, but a general idea is to treat slightly negative eigenvalues as zeros. Alternatively you can add some regularization like replacing f(A) with f(A + \varepsilon \Vert A \Vert_F I / m) for a small \varepsilon proportional to the precision. This has come up many times on this forum, e.g.:

(But this may not be an issue if your optimization problem does not favor matrices that are close to singular, i.e. if the SDP constraint is inactive at the optimum — in that case, as long as you start with a random X so that X^T X is far from singular, probably the gradient steps will keep things that way.)

5 Likes