Matrix exponential * vector for various scales

foobar_lv2 · March 25, 2024, 9:28am

I think you @Tamas_Papp should tell us a little more about your Q matrix and how many different t values you need.

Maybe you can just sample 5 different ones from your application and post them?

Also, what is the thing that Q represents?

My mental model is that you have markov process transition probabilities in some graph of states, with possibly some structural constraints (e.g. sparsity, automorphisms, etc) and numerical values that are “generic” (i.e. not fine-tuned to sit at some special critical value) and not huge (i.e. norm(Q) is not giant). My mental model is that the transition graph is strongly connected (otherwise you’d have split it up already!).

Generally, you know that you have a leading eigenvalue of 0 (your stationary distribution), and the other ones are all stable (negative real part), and likely many will be quite stable (largish negative real parts).

Absent structural reasons you can expect your eigenspaces to be of dimension 1, with non-tiny spectral gaps (random matrix theory: eigenvalues repel) – so diagonalization would be possible without numerical blow-up.

If you conceptually come from discretized PDE, then your Q is an approximation of an unbounded operator and you don’t want to apply it to vectors (you would need stiff solvers in ODE-speak).

If you need a lot of different t values, then the tradeoffs change in terms of “how much time can we spend on matrix factorization”.

ChrisRackauckas · March 25, 2024, 9:36am

mstewart:

You lost me on that. I just skimmed that really quickly, so maybe I’m misundertanding, but they seem to be showing Krylov winning or very close to R-MJP, with the main disadvantage of Krylov being the need to choose a time step and Krylov subspace dimension. The Krylov method in ExponentialUtilities.expv_timestep makes those choices automatically. I don’t know the full details of what the code does, but they cite a paper which discusses using error bounds for approximating a matrix exponential for those choices. So it seems like that would be an efficient, convenient, and already implemented black-box method for what you want. In contrast the ODE approach (RK4) had the worst performance. Maybe you can improve on RK4, but it looks like there is a factor of 5 difference in run times between RK4 and Krylov. Trying to close that gap seems like it would be more work than just using the existing Krylov method. Overall, these results look to me more like a reason to avoid ODE methods and go with Krylov.

Indeed, there’s too much structure. Runge-Kutta methods use none of that structure. Don’t use it on linear ODEs, use the methods only for linear equations when you can.

The Krylov subspace methods switch over to to lanczos to exploit this if the type is set correctly to the ExponentialUtilities call.

I would be highly surprised if that’s actually beating a higher order IMEX or even just a BDF with an appropriate sparse matrix. Low order operator splitting methods have been fairly good for textbooks but not great for computing from what I’ve found. But either way, this is pretty tailor-made for KIOPS.

stevengj · March 25, 2024, 11:07am

If the matrix is only 500x500, dense methods may still be faster than iterative methods, especially if each step is only O(n^2) (because you do the diagonalization only once).

mstewart · March 25, 2024, 11:30am

I wouldn’t look at eigenvalues alone. For non-normal matrices, a large condition number of the matrix of eigenvectors can be a problem regardless of the distribution of eigenvalues. If you have a well conditioned matrix of eigenvectors, diagonalization should work fine, otherwise it won’t. In terms of eigenvalues, if A is close to a matrix with repeated eigenvalues, it is also close to a matrix with an ill-conditioned eigenvector matrix. But since matrices with ill-conditioned eigenvector matrices can have sensitive eigenvalues, a matrix with a harmless looking set of eigenvalues can sometimes be very close to a matrix that is not diagonalizable. Pseudospectra can give more information about potential problems, but looking at the condition number of the matrix of eigenvectors seems a more direct way to check for a specific problem with computing a matrix exponential using diagonalization.

One thing I would like to know is if there are results on conditioning of eigenvector matrices of random matrices. In general, I don’t know as much about random matrices as I would like.

liuyxpp · March 25, 2024, 1:43pm

OS method is not very accurate but it is pretty fast. So in our community it is common to use it for testing and experimenting purposes.

mschauer · March 25, 2024, 5:49pm

PS: What this country needs is a \exp(18\delta A). https://cs.uwaterloo.ca/~shallit/Papers/change2.pdf

There is a weird connection to a linear programming…

goerz · March 25, 2024, 6:27pm

from the slides (eg 17) Chebyshev expansion also requires bounding of the norm, ie scaling and squaring

Maybe I shouldn’t have linked those slides I don’t think that’s right… that’s probably for the case where you can’t (or don’t want to) independently estimate the extremal eigenvalues. If you have those, you can just use them to implicitly normalize the matrix in the series expansion.

That decent estimate should be easy to come by using some form of sparse diagonalization method as the extremal eigenvalues converge rapidly

Indeed, often you actually know the spectral radius of your operator a-priori (because of “physics”), or else you can use some Krylov-type method to get an estimate for the smallest and largest eigenvalue pretty easily.

Oh, and also the slides are about actually constructing the matrix exp(Q*dt). You can do that with a Chebychev expansion, but if you actually want the application of the exponential operator to a vector, you want to pull that vector into the series, and only do matrix-vector products.

I made a MWE at GitHub - goerz-testing/cheby_for_exp: Chebychev expansion of `exp(H dt) v` with the original quantum code (exp(-1im * Q* dt) v) in cheby_for_exp.jl, and then I tried the modification for exp(-1 * Q * dt) v in cheby_for_exp_relax.jl. It’s not quite right, though, but I probably shouldn’t allow myself to get nerd-sniped into spending too much more time on figuring this out

The exp(-1 * Q * dt) v actually is also somewhat commonly used in quantum mechanics, because it converges to the ground state of Q. I do have some old Fortran code around that does the Chebychev expansion for the exp(-1 * Q * dt) v, so maybe at some point I can get back to my Julia MWE and compare them in detail to figure out what’s wrong.

There might also be a bit of a complication with figuring out when to truncate the series: for exp(-1im * Q * dt) v the output stays normalized while for exp(Q * dt) v it does not, so that might affect how you detect convergence…

Anyway, maybe you can figure out the remaining issue, but probably you’re better off with the more established methods others have proposed in this thread. (Although I’m pretty sure that if you can get the Chebychev to run, it’s going to be quite competitive in terms of performance).

I’d be interested to see how much faster this could be made with Bessels.jl . It’s often a lot (~10x) faster than the SpecialFunctions versions.

That’s interesting! I’ll definitely test that out.

In my applications, the coefficients aren’t really the bottle neck, since you can usually calculate them once and then reuse them, but that might change if you have variable dt or something. Still, something for me to benchmark at some point.

jamblejoe · March 26, 2024, 2:03pm

Having dealt with a lot of exact diagonalization of (non-Hermitian) quantum systems, I almost never encountered a situation, where using an alternate method than eigen-decomposition of the full, dense matrix was worth it for matrices of the size < 1000.

If you find whatever you want to do being severely slowed down by this simple approach, then it would be very helpful for us to provide some more information on the exact problem, Q, t's, etc.

If not, I would suggest to stick with eigen. I am curious about results

stevengj · March 26, 2024, 2:09pm

The danger with non-Hermitian systems is that if the matrix is close to defective, eigen-decomposition may be a very inaccurate way to compute the matrix exponential. (See the discussion of Method 14 in this classic paper.)

foobar_lv2 · March 26, 2024, 3:09pm

Nice link!

It appears clear that we want method 19 18 – compute schur, check whether we have almost-colliding eigenvalues / are close to defective, then either diagonalize or partially diagonalize (i.e. make the upper triangular part vanish or at least sparser), and then store that (this is the preprocessing).

If we can only partially diagonalize, then each subsequent evaluation is more expensive because we need to process the triangular baggage per t.

Unfortunately, the paper (from 2003, no less) exhibits the typical problem (obligatory xkcd).

Any updates 20 years later? Ideally in library form?

PS. LinearAlgebra.log is an example for schur with triangular baggage. It doesn’t try to further partially diagonalize, presumably since that wouldn’t pay off without an API to reuse the schur decomposition.

Tamas_Papp · March 26, 2024, 3:57pm

From the conclusion of the “… 25 years later” paper, I get the impression that they consider ODE-based methods competitive for e^{At}b, multiple t, but a lot depends on the solver.

mstewart · March 26, 2024, 3:59pm

Do you mean method 18? Method 19 is splitting.

Block diagonalization can be easier said than done. You need to make sure that the block diagonalizing transformation is well conditioned. Trying to cluster nearby eigenvalues into blocks can help with that, but it’s not really reliable. In fact, finding well conditioned similarities to do this is known to be NP-hard. In the 2008 SIAM book by N. Higham Functions of Matrices, going directly for a blocked form of the Schur-Parlett approach (a blocked form of method 17) is recommended over actually doing the diagonalization.

mstewart · March 26, 2024, 4:09pm

Is there any particular passage you read that way? Section 9 in the update to the original paper seems to be about research on solving ODEs using the matrix exponential, not finding the matrix exponential using a general ODE solver. Likewise for section 10 on Krylov methods. I’m not seeing much else on ODE solvers in the updates.

ChrisRackauckas · March 26, 2024, 5:32pm

The paper is old though, pre-KIOPS. For large systems KIOPS is pretty much the way to go. For smaller systems, there’s other tricks to do.

liuyxpp · March 27, 2024, 1:59am

Hi, Kris, is there an implementation of KIOPS in SciML that I can experiment to solve my problem? Previously I’ve been implemented ETDRK4 for my problem with both periodic and non-periodic boundaries. See

and

photor · March 27, 2024, 3:07am

What is the difference between ETDRK4 and the ordinary RK4?

Oscar_Smith · March 27, 2024, 3:17am

See https://www.sciencedirect.com/science/article/abs/pii/S0010465508001483 for discussing of etdrk4.

ChrisRackauckas · March 27, 2024, 8:07am

And they are used in:

Notice the list:

LawsonEuler - First order exponential Euler scheme. Fixed timestepping only.
NorsettEuler - First order exponential-RK scheme. Fixed timestepping only. Alias: ETD1.
ETD2 - Second order Exponential Time Differencing method (in development). Fixed timestepping only. Doesn’t support Krylov approximation.
ETDRK2 - 2nd order exponential-RK scheme. Fixed timestepping only.
ETDRK3 - 3rd order exponential-RK scheme. Fixed timestepping only.
ETDRK4 - 4th order exponential-RK scheme. Fixed timestepping only.
HochOst4 - 4th order exponential-RK scheme with stiff order 4. Fixed timestepping only.

Though note there’s also:

Exp4 - 4th order EPIRK scheme.
EPIRK4s3A - 4th order EPIRK scheme with stiff order 4.
EPIRK4s3B - 4th order EPIRK scheme with stiff order 4.
EPIRK5P1 - 5th order EPIRK scheme.
EPIRK5P2 - 5th order EPIRK scheme.
EPIRK5s3 - 5th order “horizontal” EPIRK scheme with stiff order 5. Broken.
EXPRB53s3- 5th order EPIRK scheme with stiff order 5.

Note that in most cases people would probably recommend the EPIRK methods these days.

However, I’m not convinced any of these methods are that great according to benchmarks that we’ve done. Using exponential integrators has a rather strong requirement that the linear part dominates and the nonlinear part is non-stiff, which seems to be a much stronger assumption than most people can normally assume and it doesn’t always have a clear benefit over just using an IMEX method like KenCarp4.

IMEX methods also specialize on linearity since they can factorize a Jacobian once and reuse it for the whole integration, or develop a nice preconditioner for a Krylov scheme to converge really fast on its linear operator, both really specializing the linear part at least as much as the exponential integrator, so in the end it really comes out to be a wash. There is still lots to play with so I don’t think we’ve reached the end of the story, also this is excluding IIF tricks, but for now I’m not so bullish on exponential integrators.

Tamas_Papp · April 9, 2024, 2:43pm

Sorry for the delay with the MWE, here it is.

Notes

This is a discrete CTMC approximation of

dx = \mu(x) dt + \sigma dW

bounded to a finite interval. Thus the matrix is sparse.

Irregular ts is deliberate, to rule out scaling & squaring (it is like this in my application).

ExponentialUtilities.expv_timestep is very competitive, 10^{-8} relative error compared to the direct calculation (which I am not sure is the gold standard anyway).

Densities:

Code

using SparseArrays, ExponentialUtilities, LinearAlgebra

###
### setup
###

function add_local_transitions!(Q, x_grid::AbstractVector, i, μ, σ)
    @assert length(x_grid) ≥ 2
    i_begin, i_end = extrema(axes(x_grid, 1))
    Δ(i) = x_grid[i+1] - x_grid[i]
    Δ₋ = Δ(max(i_begin, i - 1))
    Δ₊ = Δ(min(i_end - 1, i))
    x = x_grid[i]
    μx = μ(x)
    z_scale = σ(x)^2 / (Δ₊ + Δ₋)
    d = zero(eltype(Q))
    if i < i_end
        λ = z_scale / Δ₊        # noise up
        if μx > 0
            λ += μx / Δ₊        # drift up
        end
        Q[i, i + 1] += λ
        d += λ
    end
    if i_begin < i
        λ = z_scale / Δ₋
        if μx < 0
            λ += -μx / Δ₋
        end
        Q[i, i - 1] += λ
        d += λ
    end
    Q[i, i] -= d
    Q
end

function make_rate_matrix(x_grid, μ, σ)
    Base.require_one_based_indexing(x_grid)
    N = length(x_grid)
    Q = spzeros(N, N)
    for i in 1:N
        add_local_transitions!(Q, x_grid, i, μ, σ)
    end
    Q
end

###
### runtime
###

μ(x) = -0.5 * (x - 0.5)
σ(x) = x * (1 - x) * 0.2
x_grid = range(0, 1; length = 2^10 + 1)
Q = make_rate_matrix(x_grid, μ, σ)
p0 = normalize([exp(-(x-0.2)^2/0.1) for x in x_grid], 1)
ts = (cospi.(range(1, 0; length = 9)) .+ 1) .* 3
U = ExponentialUtilities.expv_timestep(ts, Q', p0)

map(sum, eachcol(U))
u_end_comparison = exp(collect(Q') .* ts[end]) * p0
maximum(abs, u_end_comparison - U[:, end]) ./ maximum(u_end_comparison) # relative error

Topic		Replies	Views
Reusing eigendecomposition for matrix exponentials General Usage	6	338	November 21, 2020
Exponential matrix calculation with the exp() function vs procedure with the use of the eigen() function New to Julia question , linearalgebra	7	7371	October 22, 2020
Optimizing Sparse Matrix Exponential Computation in Julia General Usage question , package	2	256	September 19, 2024
Speed of exp function for skew hermitian matrices New to Julia	12	832	February 17, 2022
Exponential of simple block triangular matrix Numerics question , linearalgebra	1	170	July 23, 2024

Matrix exponential * vector for various scales

Notes

Code

Related topics