Matrix exponential * vector for various scales

Given a matrix Q which is a continuous-time transition matrix, ie

  1. Q_{i,j} \ge 0 for j \ne i,
  2. \sum_j Q_{i,j} = 0 \ \forall i

I would like to calculate \exp(Qt)v for multiple t's (fixed v, Q).

Potentially Q is around 500x500 or even larger, and may be sparse or dense, and I have around a hundred $t$s. (Irreducibility can be assumed).

I am aware of ExponentialUtilities.expv_timestep but I am not sure if the above matrix has special structure that can be exploited.


Can your t values be chosen equally spaced?


That seems small enough that you could just compute a suitable factorization (e.g. eigen if your matrix happens to symmetric) and then compute the matrix exponentials. This is reasonably simple and robust. For sparse methods like Krylov subspace you always need to ensure convergence which can be a source of headaches.

1 Like

For non-Hermitian matrices, the exp algorithm doesn’t use a factorization, though, and it’s not really set up to re-use computations for different scalings of the matrix.

If your matrix is Hermitian (or normal), I agree that diagonalization is the way to go.


Even if the t values are not equally spaced, we can still consider this problem to be the solution of an ODE sampled at various time steps, right?


Are you considering a doubling algorithm? The t values are affine transformations of Chebyshev nodes, but I guess I can do a change of variables to angles and then they are equally spaced.

The only special thing about the matrix is that its rows sum to 0, nothing else. I guess I could do an SVD then — for sufficiently many t values, it could be faster than matrix exponential algorithms since the cost is one-time.

Yes, I will benchmark that too. Any suggestions about the solver? (If not, I would just use Tsit5()). My main concern is not about accuracy per se, but not enforcing \exp(At)v \ge 0 (mathematically, it follows from the properties of A and v, but numerically it can dip below 0).

I don’t know of any way to use the SVD to get a matrix exponential. Generally to leverage a matrix decomposition for this, you need to decompose Q using a similarity. If you have some reason to expect Q to be diagonalizable with a well conditioned matrix of eigenvectors, you could consider diagonalization. You might get away with it with a particular matrix, but it sounds like you want something you can run on more than one Q. A Schur decomposition can also be used, but without some extra effort to deal with repeated or nearly repeated eigenvalues, it’s likely to be quite unstable. And in any event, I don’t know of any way to make it less than an O(n^3) computation after computing the Schur decomposition. So I think it doesn’t help much with multiple t values.

Everything I know about has been mentioned here already. There are potential accuracy and/or efficiency problems with almost anything you can do for a non-normal matrix. My guess is that your first thought about using the Krylov method in ExponentialUtilities.expv_timestep might be competitive. It would be fairly high up on my list of things I would try, anyway. Some sort of doubling scheme for uniformly spaced points also sounds like it might work well. My impression, which might be out of date, has always been that ODE solvers can be quite slow if you want higher accuracy.


Thanks. I found a fairly recent paper

that deals exactly with the problem posed in the original question, and indeed it finds the Krylov projection method competitive (the 5th one they list).

They also discuss (and dismiss) RK4, so the ODE-bases approach is what I will proceed with now as I don’t want to go get distracted by implementing a new algorithm. DifferentialEquations.jl has a ton of excellent solvers I can experiment with, many of which improve on RK4. If I later replace this part of the code, at least I have the unit tests ready :wink:


You lost me on that. I just skimmed that really quickly, so maybe I’m misundertanding, but they seem to be showing Krylov winning or very close to R-MJP, with the main disadvantage of Krylov being the need to choose a time step and Krylov subspace dimension. The Krylov method in ExponentialUtilities.expv_timestep makes those choices automatically. I don’t know the full details of what the code does, but they cite a paper which discusses using error bounds for approximating a matrix exponential for those choices. So it seems like that would be an efficient, convenient, and already implemented black-box method for what you want. In contrast the ODE approach (RK4) had the worst performance. Maybe you can improve on RK4, but it looks like there is a factor of 5 difference in run times between RK4 and Krylov. Trying to close that gap seems like it would be more work than just using the existing Krylov method. Overall, these results look to me more like a reason to avoid ODE methods and go with Krylov.

I think you should just take a look at the spectral gaps, like

julia> A=rand(500,500); spec=eigvals(A); minimum(abs(spec[i] -spec[j]) for i=1:length(spec) for j=1:length(spec) if i!=j)

If your system has structure that forces collision of eigenvalues (e.g. automorphisms), and you can’t use that structure, then diagonalization may be bad.

If your spectral gaps are sensible, then diagonalization should work ok. Random matrices tend to have good spectral gaps (random matrix theory – eigenvalues repel each other).

PS. If you only have few almost-colliding eigenvalues, then you can always schur and then try to partially diagonalize, leaving small “almost jordan” blocks [a b; 0 a'] with b / (a-a') \gg 1. These make diagonalization / jordan numerically unstable, but you can leave them as is if your corresponding spaces are low dimensional.

1 Like

If they were equally spaced times n \Delta t, you could use \exp(An\Delta t) = [\exp(A \Delta t)]^n, allowing you to compute a single matrix exponential and then just multiply by it repeatedly.


Just sketching out an idea for the Linear Algebra buffs (profs?) here to evaluate on how to handle not-equally-spaced t:

  1. choose a small \Delta t (around T/2^k where T = maximum(t) ).
  2. calculate M = \exp(A\Delta t).
  3. using repeated squaring calculate M^{2^i} for i = 1 .. k .
  4. approximate each t_j using m_j \cdot \Delta t (m_j integer).
  5. calculate \exp(A \cdot m_j \cdot \Delta t) using matrices from 3.
  6. fix resulting matrix using approximation of \exp((t_j - m_j\cdot \Delta t) \cdot A) , the approximation could be linear (for speed).

Some speedups can be done for step 5 by reusing matrices from different t values.


Can this method being used to solve the following PDE:

\frac{\partial q}{\partial t} = \nabla^2q - w(\mathbf{r})q

where q=q(\mathbf{r},t) and \mathbf{r} is a spatial vector. Currently we use an operator splitting method (only 2nd order accurate) or FD. I wonder if there is any advantage to solve q as

q = [\exp(A\Delta t)]^n

with A = \nabla^2 - w.

You probably don’t want to construct a dense matrix exponential if you have discretized PDE, unless the problem is tiny.

(Whereas @Tamas_Papp’s matrices were 500x500, small enough for dense-matrix methods to be competitive.)


I see. The size depends on what I want to compute however. 32x32x32 or 128x128 is common use case. For higher precision and more stiff problem (w tends to be a step function), size can go up to 256x256x256.

For what it’s worth, this is almost the as same as solving the Schrödinger equation in quantum mechanics:

i \frac{\partial}{\partial t} v(t) = Q v(t) \quad \Leftrightarrow \quad v(t) = \exp[{-i Q t}] v(0)

with a complex vector v (usually called \Psi, and Q is usually called H, the Hamiltonian).

The (provably) most efficient way to solve that to high precision is to expand the exponential into Chebychev polynomials, as long as the following requirements hold:

  • Q has real (or completely imaginary) eigenvalues
  • There is a decent estimate for the spectral range of Q (lowest and highest eigenvalue). This is because Chebychev polynomials are defined on [-1, 1], so the expansion has to use a normalized Q. It’s okay to overestimate the spectral range a little bit (a “pessimistic” guess)

The nice thing about using a Chebychev expansion is that the expansion coefficients for the exponential specifically can be calculated analytically with Bessel functions. The entire expansion is also really short to code up, and the recursive implementation is very fast.

I have an implementation of the quantum case at QuantumPropagators.jl/src/cheby.jl at master · JuliaQuantumControl/QuantumPropagators.jl · GitHub (docs). There’s a very brief summary of the method in my thesis, chapter 3.2.1, and of course the original literature (see references).

You’d only need the cheby_coeffs and cheby functions, and adapt them to the standard exponential without the imaginary i. It’s been a while, but if I recall correctly, the regular exponential just uses the modified Bessel function (besseli instead of bessel) and you kill the i in the definition of c and in the final phase factor. You can probably also find some discussion of the Chebychev expansion of a standard exponential on Google, e.g., these slides.

It would be best if your time step was equidistant, because then you can use the same coefficients for all time steps. Otherwise, you’ll have to recalculate them for each dt. Or you can probably use the coefficients for the largest dt if they don’t vary too much (should be equivalent to a more pessimistic estimation of the spectral range, which only means you’re using more terms than you need to)


I’d be interested to see how much faster this could be made with Bessels.jl. It’s often a lot (~10x) faster than the SpecialFunctions versions.


That decent estimate should be easy to come by using some form of sparse diagonalization method as the extremal eigenvalues converge rapidly :slight_smile: After having found the first eigenvalue, you just shift the matrix by that and then run the method again to get the other extremal eigenvalue.

1 Like

Reading and appreciating the nice discussion :+1:


Maybe I am missing something, but from the slides (eg 17) Chebyshev expansion also requires bounding of the norm, ie scaling and squaring.

I guess I could bound it for the largest t?