Key question: is the matrix Hermitian / normal? The best methods for optimizing eigenvalues require this, because in that case even degenerate eigenvalues have a generalized directional derivative.
Whereas for non-normal matrices, you can get defective matrices (2x2 Jordan blocks) where even directional derivatives do not exist (the sensitivity diverges). In such cases you usually need to re-think what the eigenvalues even mean.
Suppose it is Hermitian, and you are trying to optimize the minimum eigenvalue. You use a Krylov method (e.g. Lanczos or LOBPCG) to compute the smallest few eigenvalues (at least 2, but usually more than this) and corresponding eigenvectors. Then, you have multiple options:
-
If the smallest eigenvalues are within some tolerance of one another, you treat them as degenerate and compute the generalized directional derivative (what physicists call degenerate perturbation theory) by e.g. the method of Seyranian et al. (1994). (You then need to implement a custom optimization algorithm, too, since ordinary optimization algorithms take an ordinary gradient.) Seyranian uses a relative tolerance of 10^{-3} to 10^{-5}, but the arbitrary nature of this tolerance is somewhat irksome.
-
Alternatively, you can project the whole matrix onto the subspace of these eigenvectors, degenerate or not, linearize your matrix in the parameters, and apply an SDP solver to find an optimization step: this algorithm was introduced in Men et al. (2010). (Again, you write your own optimization “outer loop”; the linearized SDP is analogous to a steepest-descent step.) One nice feature of this is that you let mature SDP solvers deal with degeneracy issues, if any.
You don’t use different initial guesses — that will just give you the same extremal eigenvector(s) over and over. You generally just ask the Krylov method to find > 1 extremal eigenvectors all at once (a “block” iteration). Alternatively, in a Hermitian problem, you can use “deflation” where you project out previously computed eigenvectors.
Linearly dependent eigenvectors of degenerate eigenvalues arise only in non-Hermitian/non-normal problems where you have defective matrices (≥ 2x2 Jordan blocks, geometric multiplicity < algebraic multiplicity … what physicists call an “exceptional point” or “EP”). In this case the above methods do not work, the sensitivity (even in particular directions) diverges (see e.g. Seyranian and Mailybaev (2003)), and the whole idea of using eigenvalues and eigenvectors to understand a system becomes suspect (see e.g. Trefethen (2005)).
If EPs can arise for the eigenvalue you are optimizing, you probably need to re-think your whole approach and whether you are optimizing the right thing.
(Usually EPs happen only because you force them intentionally … otherwise it’s pretty unlikely for two eigenvalues in the complex plane to collide. In this case, where you know you have an EP or nearly an EP, a scalable way to compute the generalized eigenvectors of large sparse systems was described by our 2017 paper.)