Taking gradients of a matrix exponential

This Taylor-series algorithm is discussed in section 3 (“method 1”) of the classic paper Nineteen dubious ways to compute the exponential of a matrix (1978). Basically, the naive series is quite unreliable, even if you sum enough terms, because it is susceptible to catastrophic cancellation.

(Contrary to popular misconception from first-year calculus, Taylor series are not typically how special functions are computed.)

It’s really much safer to use the built-in exp function here, which means that you need to teach your AD system to use a custom rule (e.g. the one from ChainRules.jl).

2 Likes