Minimization of degenerate eigenvalue of sparse matrix

Hi,

I have a tricky optimization problem where I need to minimize a specific eigenvalue of a matrix M(\theta) that depends continuously on a set of parameters \theta. The problem is that the eigenvalue is usually degenerate, making the minimization problem non-differentiable. There are known ways around this, and with help from the excellent resources shared before on this forum ([1], [2] – thank you for these!), I have implemented a basic solver that works reasonably well for my problem, as long as the matrices are not too large (less than 500x500 elements, say).

However, my matrices are usually very sparse, and I would like to take advantage of this somehow. The central question for me is how I can efficiently and reliably find the full degenerate eigenspace using sparse solvers? My (very rudimentary) understanding of Krylov methods and the like is that a single run will yield an essentially random eigenvector of the degenerate space. Is it safe to just keep running sparse eigensolves with different initial guesses, record all eigenvectors, and stop once the latest vector is linearly dependent?

If there is a package that can deal with this optimization problem out of the box and makes my question obsolete, that would of course be even better.

In case it is relevant: My matrix has only real elements, is not symmetric, but I know on physical grounds that all eigenvalues are nevertheless real and \leq 0, for all values of the parameters \theta. The eigenvalue I am interested in is the largest non-zero eigenvalue, and I know the index of this eigenvalue beforehand. This is because I know a priori how many zero-eigenvalues there are (usually between 3-10) and I even know the eigenspace of all the zero-eigenvalues – in principle I could project them out, but I am not sure if that is worth the effort it if my matrix is larger than a few hundred elements.

Any comments or suggestions are highly appreciated! :slight_smile:

Key question: is the matrix Hermitian / normal? The best methods for optimizing eigenvalues require this, because in that case even degenerate eigenvalues have a generalized directional derivative.

Whereas for non-normal matrices, you can get defective matrices (2x2 Jordan blocks) where even directional derivatives do not exist (the sensitivity diverges). In such cases you usually need to re-think what the eigenvalues even mean.

Suppose it is Hermitian, and you are trying to optimize the minimum eigenvalue. You use a Krylov method (e.g. Lanczos or LOBPCG) to compute the smallest few eigenvalues (at least 2, but usually more than this) and corresponding eigenvectors. Then, you have multiple options:

  • If the smallest eigenvalues are within some tolerance of one another, you treat them as degenerate and compute the generalized directional derivative (what physicists call degenerate perturbation theory) by e.g. the method of Seyranian et al. (1994). (You then need to implement a custom optimization algorithm, too, since ordinary optimization algorithms take an ordinary gradient.) Seyranian uses a relative tolerance of 10^{-3} to 10^{-5}, but the arbitrary nature of this tolerance is somewhat irksome.

  • Alternatively, you can project the whole matrix onto the subspace of these eigenvectors, degenerate or not, linearize your matrix in the parameters, and apply an SDP solver to find an optimization step: this algorithm was introduced in Men et al. (2010). (Again, you write your own optimization “outer loop”; the linearized SDP is analogous to a steepest-descent step.) One nice feature of this is that you let mature SDP solvers deal with degeneracy issues, if any.

You don’t use different initial guesses — that will just give you the same extremal eigenvector(s) over and over. You generally just ask the Krylov method to find > 1 extremal eigenvectors all at once (a “block” iteration). Alternatively, in a Hermitian problem, you can use “deflation” where you project out previously computed eigenvectors.

Linearly dependent eigenvectors of degenerate eigenvalues arise only in non-Hermitian/non-normal problems where you have defective matrices (≥ 2x2 Jordan blocks, geometric multiplicity < algebraic multiplicity … what physicists call an “exceptional point” or “EP”). In this case the above methods do not work, the sensitivity (even in particular directions) diverges (see e.g. Seyranian and Mailybaev (2003)), and the whole idea of using eigenvalues and eigenvectors to understand a system becomes suspect (see e.g. Trefethen (2005)).

If EPs can arise for the eigenvalue you are optimizing, you probably need to re-think your whole approach and whether you are optimizing the right thing.

(Usually EPs happen only because you force them intentionally … otherwise it’s pretty unlikely for two eigenvalues in the complex plane to collide. In this case, where you know you have an EP or nearly an EP, a scalable way to compute the generalized eigenvectors of large sparse systems was described by our 2017 paper.)

3 Likes

Thanks for the in-depth reply.

Unfortunately, my matrix is neither hermitian nor normal, the only property I know for sure is that all elements are real. Still, I consistently find that all eigenvalues are real and non-positive. This suggests that my matrix is pseudo-hermitian, i.e. it is hermitian with respect to a different inner product, but I have no idea how to find that inner product or why my matrix seems to have this property in the first place.

In practice, I find that using the method described in Seyranian et al. works well for my problem, the only difference in my case is that I have to keep track of right and left eigenvectors separately. My main issue is extending my already working implementation of this to exploit sparsity.

I see, good to know that “block” iteration is the relevant approach. My idea of rerunning from different initial guesses and then orthogonalizing came from my reading of the KrylovKit.jl documentation:

From a theoretical point of view, Krylov methods can at most find a single eigenvector associated with a targetted eigenvalue, even if the latter is repeated, i.e. the eigenvalue has multiple linearly independent eigenvectors or thus an eigenspace with dimension (geometric multiplicity) larger than one. In the case of such a repeated eigenvalue, the specific eigenvector that is returned is determined by the starting vector x₀ . For large problems, this turns out to be less of an issue in practice, as often a second linearly independent eigenvector is generated out of the numerical noise resulting from the orthogonalisation steps in the Lanczos or Arnoldi iteration. Nonetheless, it is important to take this into account and to try not to depend on this potentially fragile behaviour, especially for smaller problems.

But I think I just misunderstood this section of the docs.

I just saw that the newest version of KrylovKit.jl has a BlockLanczos algorithm, but for my non-hermitian problem, I guess I need to wait until a BlockArnoldi is implemented. Once that is available, am I right that as long as a I request more eigenvectors than the multiplicity of my eigenvalue, the method will return a complete eigenbasis of the degenerate eigenvalue?

Also, regarding the need to implement a custom optimization algorithm: can I do better than simply passing the steepest-descent direction to a basic gradient descent? I also have a couple of non-linear constraints that I am enforcing during optimization, and I need to use quite small step sizes to to keep these constraints satisfied, which also adds to the total runtime.

Where does your matrix come from?

It comes from a linear stability analysis of a high-dimensional ODE system. That system is derived from a type of mass-action chemical reaction network. I am linearizing the ODE system around the stable stationary solution, so I know that the relevant eigenvalues must have negative real part. (There are zero-eigenvalues corresponding to perturbations that do not conserve mass, but these can be projected out.) Beyond this, I don’t know of any additional structure in my problem, so I am a bit puzzled by the apparent pseudo-hermitian-ness of my matrix and it is unclear to me how or if that could be exploited.

The absolute values of the eigenvalues are related to the relaxation times of the system after small perturbations around equilibrium and my optimization problem corresponds to minimizing the longest relaxation time.

Isn’t the linearization of this some kind of linear diffusion-like equation? Shouldn’t that be Hermitian if the original equations satisfy detailed balance? e.g. diffusion operators \rho(x) \nabla^2 are Hermitian in an inner product weighted by 1/\rho.

For example, Cáceres and Cañizo (2016) linearizes a reaction–diffusion system around equilibrium and shows that the resulting matrix is Hermitian under a weighted inner product (see the proof of Lemma 6.2).

3 Likes

Thank you. You are 100% correct. I was working with a less general expression for the stability matrix, where the symmetry was not as clear as in Eq. (55) of the paper you sent. In hindsight the symmetry is painfully obvious.

What does this mean for my optimization problem in practice? I guess if I work with the symmetrized version of my matrix, I can save lots of time because I don’t have to compute left and right eigenvectors separately, and I can directly use e.g. a block lanczos method to find the eigenbasis?

Lots of good things. You can use Lanczos which is more efficient than Arnoldi, you have access to the block Lanczos implementation, you are guaranteed real eigenvalues, you are guaranteed orthogonal eigenvectors (in the appropriate inner product/rescaling) so it is never defective and the generalized directional derivative always exists, and you can also use the subspace-SDP method if you want.

2 Likes

Wait a second, [1612.03687] Close-to-equilibrium behaviour of quadratic reaction-diffusion systems with detailed balance is about reaction-diffusion systems?

Are you considering an ODE system, or a PDE system here?

But nobody prevents your ODE system from encountering a bifurcation when you change \theta? You will keep a stable object, but your stable stationary solution could e.g. Hopf with periodic attractor or split into a saddle and two stable stationary solutions?

Or do you have some monotone entropy thing that prevents such shennenigans? (e.g. if your system is closed, not open)

Regarding the “degenerate eigenvalues tend to only happen for a reason” thing: Mass-action is kinda dicy – if your stochiometry lines up in a bad way, there can be algebraic relations left over, and the resulting system is non-generic in the parameters. I would tend to view that as a fluke / artifact: Any slight modification of mass-action would remove the degeneracy. The experiment to run would be:

  1. Is your eigenvalue really degenerate, almost independently of \theta?
  2. Try other reaction kinetics, to see whether the degeneracy persists.

One can probably do some checks on whether you are in the “mass-action fluke” situation. Basically by running on finite fields and using Schwartz–Zippel lemma - Wikipedia .

More generally, anything you can express as rational functions in \theta is eligible for black-box factorization (i.e. you only need to be able to evaluate the thing, like the characteristic polynomial. Writing down an expansion into monomials for the characteristic polynomial is of course infeasible, but you don’t need that, just go to a finite field, plug in values, and run Gauss elimiation).

I don’t remember the exact checks you would need, nor the “mass-action algebraic degeneracy” counterexamples. Related would be [1606.00279] Sensitivity of chemical reaction networks: a structural approach 3. Regular multimolecular systems

But I could think about it again.

PS. What you need to do to verify whether you really have structural degeneracy is to check for repeated factors in the factorization of the characteristic polynomial. So you really just need to plug random values in a finite field into all your parameters, then compute the characteristic polynomial over one variable, and check gcd(p, p'). Hence, you only need to compute 500 determinants of 500x500 over a finite field with some random 64 bit prime modulus, interpolate to get the full characteristic polynomial, and then possibly repeat once with different random values, for reasonable confidence. The computation shouldn’t take longer than a coffee.

1 Like

This is great. Thank you so much for your help! That SDP method also looks very interesting, I will give that a try and compare.

I am assuming throughout that my system is well mixed, so I don’t have the diffusion term and my system just contains ODEs. But the relevant term of the stability matrix (Eq. (55) in that paper) has the same symmetry properties whether or not diffusion is included.

But nobody prevents your ODE system from encountering a bifurcation when you change
θ? You will keep a stable object, but your stable stationary solution could e.g. Hopf with periodic attractor or split into a saddle and two stable stationary solutions?

My reaction network is really quite “boring” in this sense, because it is derived from an aggregation/fragmentation process where everything is reversible, detailed balance holds, and the system is closed. In fact I can compute the steady state directly from statistical mechanics without solving the ODE system at all, so I know that the stable solution is always unique, attractive and smoothly-changing with \theta.

I usually only encounter degeneracies as a result of the optimization process, for generic choices of \theta I don’t find any repeated eigenvalues (except for the zero-eigenvalues from perturbations that don’t conserve mass).

Because of this I don’t really expect “real”/structural degeneracies to be an issues. Thanks for these tools for checking for structural degeneracies though – these might come in handy in the future.

That’s good. You could use the paper I linked to directly get the sparsity structure of the derivative of the steady state by \theta for “generic” kinetics. But that’s not what you’re after.

Regarding the degenerate eigenvalues… if you minimize the leading eigenvalue, then it will of course tend to collide with the second one: Your gradient pulls the leading eigenvalue to the left, while there is no gradient acting on the second one.

There must be existing literature on this issue (how to minimize spectrum of parameter-dependent matrix). But I am not acquainted with that literature.

I would try e.g. \sum e^{\lambda_i} or \sum e^{-\lambda_i^2} as an optimization target. Maybe with a cut-off to only include terms with |\lambda_i - \lambda_0|<c, in order to avoid computing the full spectrum.

Eigenvalue collisions might be unavoidable (e.g. they need to change places in order to reach the minimum), but you can at least make a token effort to avoid them – and once you have an eigenvalue collision, your gradients can otherwise become discontinuous.

See the papers cited above, which are on precisely this issue.

1 Like