Matrix exponential * vector for various scales

I think you @Tamas_Papp should tell us a little more about your Q matrix and how many different t values you need.

Maybe you can just sample 5 different ones from your application and post them?

Also, what is the thing that Q represents?

My mental model is that you have markov process transition probabilities in some graph of states, with possibly some structural constraints (e.g. sparsity, automorphisms, etc) and numerical values that are “generic” (i.e. not fine-tuned to sit at some special critical value) and not huge (i.e. norm(Q) is not giant). My mental model is that the transition graph is strongly connected (otherwise you’d have split it up already!).

Generally, you know that you have a leading eigenvalue of 0 (your stationary distribution), and the other ones are all stable (negative real part), and likely many will be quite stable (largish negative real parts).

Absent structural reasons you can expect your eigenspaces to be of dimension 1, with non-tiny spectral gaps (random matrix theory: eigenvalues repel) – so diagonalization would be possible without numerical blow-up.

If you conceptually come from discretized PDE, then your Q is an approximation of an unbounded operator and you don’t want to apply it to vectors (you would need stiff solvers in ODE-speak).

If you need a lot of different t values, then the tradeoffs change in terms of “how much time can we spend on matrix factorization”.

1 Like

Indeed, there’s too much structure. Runge-Kutta methods use none of that structure. Don’t use it on linear ODEs, use the methods only for linear equations when you can.

The Krylov subspace methods switch over to to lanczos to exploit this if the type is set correctly to the ExponentialUtilities call.

I would be highly surprised if that’s actually beating a higher order IMEX or even just a BDF with an appropriate sparse matrix. Low order operator splitting methods have been fairly good for textbooks but not great for computing from what I’ve found. But either way, this is pretty tailor-made for KIOPS.

1 Like

If the matrix is only 500x500, dense methods may still be faster than iterative methods, especially if each step is only O(n^2) (because you do the diagonalization only once).

2 Likes

I wouldn’t look at eigenvalues alone. For non-normal matrices, a large condition number of the matrix of eigenvectors can be a problem regardless of the distribution of eigenvalues. If you have a well conditioned matrix of eigenvectors, diagonalization should work fine, otherwise it won’t. In terms of eigenvalues, if A is close to a matrix with repeated eigenvalues, it is also close to a matrix with an ill-conditioned eigenvector matrix. But since matrices with ill-conditioned eigenvector matrices can have sensitive eigenvalues, a matrix with a harmless looking set of eigenvalues can sometimes be very close to a matrix that is not diagonalizable. Pseudospectra can give more information about potential problems, but looking at the condition number of the matrix of eigenvectors seems a more direct way to check for a specific problem with computing a matrix exponential using diagonalization.

One thing I would like to know is if there are results on conditioning of eigenvector matrices of random matrices. In general, I don’t know as much about random matrices as I would like.

1 Like

OS method is not very accurate but it is pretty fast. So in our community it is common to use it for testing and experimenting purposes.

2 Likes

PS: What this country needs is a \exp(18\delta A). https://cs.uwaterloo.ca/~shallit/Papers/change2.pdf

There is a weird connection to a linear programming…

from the slides (eg 17) Chebyshev expansion also requires bounding of the norm, ie scaling and squaring

Maybe I shouldn’t have linked those slides :wink: I don’t think that’s right… that’s probably for the case where you can’t (or don’t want to) independently estimate the extremal eigenvalues. If you have those, you can just use them to implicitly normalize the matrix in the series expansion.

That decent estimate should be easy to come by using some form of sparse diagonalization method as the extremal eigenvalues converge rapidly

Indeed, often you actually know the spectral radius of your operator a-priori (because of “physics”), or else you can use some Krylov-type method to get an estimate for the smallest and largest eigenvalue pretty easily.

Oh, and also the slides are about actually constructing the matrix exp(Q*dt). You can do that with a Chebychev expansion, but if you actually want the application of the exponential operator to a vector, you want to pull that vector into the series, and only do matrix-vector products.

I made a MWE at GitHub - goerz-testing/cheby_for_exp: Chebychev expansion of `exp(H dt) v` with the original quantum code (exp(-1im * Q* dt) v) in cheby_for_exp.jl, and then I tried the modification for exp(-1 * Q * dt) v in cheby_for_exp_relax.jl. It’s not quite right, though, but I probably shouldn’t allow myself to get nerd-sniped into spending too much more time on figuring this out :wink:

The exp(-1 * Q * dt) v actually is also somewhat commonly used in quantum mechanics, because it converges to the ground state of Q. I do have some old Fortran code around that does the Chebychev expansion for the exp(-1 * Q * dt) v, so maybe at some point I can get back to my Julia MWE and compare them in detail to figure out what’s wrong.

There might also be a bit of a complication with figuring out when to truncate the series: for exp(-1im * Q * dt) v the output stays normalized while for exp(Q * dt) v it does not, so that might affect how you detect convergence…

Anyway, maybe you can figure out the remaining issue, but probably you’re better off with the more established methods others have proposed in this thread. (Although I’m pretty sure that if you can get the Chebychev to run, it’s going to be quite competitive in terms of performance).

I’d be interested to see how much faster this could be made with Bessels.jl . It’s often a lot (~10x) faster than the SpecialFunctions versions.

That’s interesting! I’ll definitely test that out.

In my applications, the coefficients aren’t really the bottle neck, since you can usually calculate them once and then reuse them, but that might change if you have variable dt or something. Still, something for me to benchmark at some point.

Having dealt with a lot of exact diagonalization of (non-Hermitian) quantum systems, I almost never encountered a situation, where using an alternate method than eigen-decomposition of the full, dense matrix was worth it for matrices of the size < 1000.

If you find whatever you want to do being severely slowed down by this simple approach, then it would be very helpful for us to provide some more information on the exact problem, Q, t's, etc.

If not, I would suggest to stick with eigen. I am curious about results :slight_smile:

The danger with non-Hermitian systems is that if the matrix is close to defective, eigen-decomposition may be a very inaccurate way to compute the matrix exponential. (See the discussion of Method 14 in this classic paper.)

6 Likes

Nice link!

It appears clear that we want method 19 18 – compute schur, check whether we have almost-colliding eigenvalues / are close to defective, then either diagonalize or partially diagonalize (i.e. make the upper triangular part vanish or at least sparser), and then store that (this is the preprocessing).

If we can only partially diagonalize, then each subsequent evaluation is more expensive because we need to process the triangular baggage per t.

Unfortunately, the paper (from 2003, no less) exhibits the typical problem (obligatory xkcd).

Any updates 20 years later? Ideally in library form?

PS. LinearAlgebra.log is an example for schur with triangular baggage. It doesn’t try to further partially diagonalize, presumably since that wouldn’t pay off without an API to reuse the schur decomposition.

From the conclusion of the “… 25 years later” paper, I get the impression that they consider ODE-based methods competitive for e^{At}b, multiple t, but a lot depends on the solver.

Do you mean method 18? Method 19 is splitting.

Block diagonalization can be easier said than done. You need to make sure that the block diagonalizing transformation is well conditioned. Trying to cluster nearby eigenvalues into blocks can help with that, but it’s not really reliable. In fact, finding well conditioned similarities to do this is known to be NP-hard. In the 2008 SIAM book by N. Higham Functions of Matrices, going directly for a blocked form of the Schur-Parlett approach (a blocked form of method 17) is recommended over actually doing the diagonalization.

2 Likes

Is there any particular passage you read that way? Section 9 in the update to the original paper seems to be about research on solving ODEs using the matrix exponential, not finding the matrix exponential using a general ODE solver. Likewise for section 10 on Krylov methods. I’m not seeing much else on ODE solvers in the updates.

The paper is old though, pre-KIOPS. For large systems KIOPS is pretty much the way to go. For smaller systems, there’s other tricks to do.

3 Likes

Hi, Kris, is there an implementation of KIOPS in SciML that I can experiment to solve my problem? Previously I’ve been implemented ETDRK4 for my problem with both periodic and non-periodic boundaries. See

and

2 Likes

What is the difference between ETDRK4 and the ordinary RK4?

See https://www.sciencedirect.com/science/article/abs/pii/S0010465508001483 for discussing of etdrk4.

2 Likes

And they are used in:

Notice the list:

  • LawsonEuler - First order exponential Euler scheme. Fixed timestepping only.
  • NorsettEuler - First order exponential-RK scheme. Fixed timestepping only. Alias: ETD1.
  • ETD2 - Second order Exponential Time Differencing method (in development). Fixed timestepping only. Doesn’t support Krylov approximation.
  • ETDRK2 - 2nd order exponential-RK scheme. Fixed timestepping only.
  • ETDRK3 - 3rd order exponential-RK scheme. Fixed timestepping only.
  • ETDRK4 - 4th order exponential-RK scheme. Fixed timestepping only.
  • HochOst4 - 4th order exponential-RK scheme with stiff order 4. Fixed timestepping only.

Though note there’s also:

  • Exp4 - 4th order EPIRK scheme.
  • EPIRK4s3A - 4th order EPIRK scheme with stiff order 4.
  • EPIRK4s3B - 4th order EPIRK scheme with stiff order 4.
  • EPIRK5P1 - 5th order EPIRK scheme.
  • EPIRK5P2 - 5th order EPIRK scheme.
  • EPIRK5s3 - 5th order “horizontal” EPIRK scheme with stiff order 5. Broken.
  • EXPRB53s3- 5th order EPIRK scheme with stiff order 5.

Note that in most cases people would probably recommend the EPIRK methods these days.

However, I’m not convinced any of these methods are that great according to benchmarks that we’ve done. Using exponential integrators has a rather strong requirement that the linear part dominates and the nonlinear part is non-stiff, which seems to be a much stronger assumption than most people can normally assume and it doesn’t always have a clear benefit over just using an IMEX method like KenCarp4.

IMEX methods also specialize on linearity since they can factorize a Jacobian once and reuse it for the whole integration, or develop a nice preconditioner for a Krylov scheme to converge really fast on its linear operator, both really specializing the linear part at least as much as the exponential integrator, so in the end it really comes out to be a wash. There is still lots to play with so I don’t think we’ve reached the end of the story, also this is excluding IIF tricks, but for now I’m not so bullish on exponential integrators.

4 Likes

Sorry for the delay with the MWE, here it is.

Notes

This is a discrete CTMC approximation of

dx = \mu(x) dt + \sigma dW

bounded to a finite interval. Thus the matrix is sparse.

Irregular ts is deliberate, to rule out scaling & squaring (it is like this in my application).

ExponentialUtilities.expv_timestep is very competitive, 10^{-8} relative error compared to the direct calculation (which I am not sure is the gold standard anyway).

Densities:

Code

using SparseArrays, ExponentialUtilities, LinearAlgebra

###
### setup
###

function add_local_transitions!(Q, x_grid::AbstractVector, i, μ, σ)
    @assert length(x_grid) ≥ 2
    i_begin, i_end = extrema(axes(x_grid, 1))
    Δ(i) = x_grid[i+1] - x_grid[i]
    Δ₋ = Δ(max(i_begin, i - 1))
    Δ₊ = Δ(min(i_end - 1, i))
    x = x_grid[i]
    μx = μ(x)
    z_scale = σ(x)^2 / (Δ₊ + Δ₋)
    d = zero(eltype(Q))
    if i < i_end
        λ = z_scale / Δ₊        # noise up
        if μx > 0
            λ += μx / Δ₊        # drift up
        end
        Q[i, i + 1] += λ
        d += λ
    end
    if i_begin < i
        λ = z_scale / Δ₋
        if μx < 0
            λ += -μx / Δ₋
        end
        Q[i, i - 1] += λ
        d += λ
    end
    Q[i, i] -= d
    Q
end

function make_rate_matrix(x_grid, μ, σ)
    Base.require_one_based_indexing(x_grid)
    N = length(x_grid)
    Q = spzeros(N, N)
    for i in 1:N
        add_local_transitions!(Q, x_grid, i, μ, σ)
    end
    Q
end

###
### runtime
###

μ(x) = -0.5 * (x - 0.5)
σ(x) = x * (1 - x) * 0.2
x_grid = range(0, 1; length = 2^10 + 1)
Q = make_rate_matrix(x_grid, μ, σ)
p0 = normalize([exp(-(x-0.2)^2/0.1) for x in x_grid], 1)
ts = (cospi.(range(1, 0; length = 9)) .+ 1) .* 3
U = ExponentialUtilities.expv_timestep(ts, Q', p0)

map(sum, eachcol(U))
u_end_comparison = exp(collect(Q') .* ts[end]) * p0
maximum(abs, u_end_comparison - U[:, end]) ./ maximum(u_end_comparison) # relative error
1 Like