Differentiating through a Jump Problem

I took a stab at it, but it turns out to be impossible :laughing:. The reason is because after doing the rate calculations, you have to arrive at an integer for which jump is going to occur, and that integer value is going to “break” the differentiation chain (duals come out zero), so AD cannot be directly applied here. Mixings don’t seem to make sense.

At a more fundamental level, I dug up a Petzold paper that made it a bit more explicit. They took a look at the chemical master equations, and sure enough, the resulting sensitivity equations are not something that can be estimated using an SSA algorithm.

Such an equation should be solved simultaneously with the CME. As with the CME, the infinite dimensionality of the coupled sensitivity-CME differential equation makes its analytical solution difficult to construct. Moreover, the SSA cannot be directly applied to solve the sensitivity equation without loss of rigorous physical basis (Gillespie, 1992a). These reasons motivate application of a black-box approach, such as finite difference, to estimate the sensitivity coefficients below

The bigger issue than what they mention there is that the sensitivity (the derivative of a distribution) is not necessarily a distribution itself, so you’d have to find some way of normalizing it on the fly in order to create some sampling based approach for it. It looks like such a normalizing method is derived here:


but AD won’t give you back that same algorithm automatically. Instead, their method can be implemented, replacing finite differences with dual numbers, and it would work, but it’s fundamentally different than just taking the derivative along each trajectory like simple applications of AD here would do.

So, :man_shrugging: this needs an adjoint. Note that certain subsets of this problem are AD-differentiable, i.e. if you just have one rate and want to differentiate the solution with respect to a jump parameter in affect function, and a single jump with a rate can AD though (if time is a Dual number as well, so you’d need to make a small modification). But the moment you aggregate rates you need to do something fundamentally different.

This is pointing to the idea though that the Next Reaction Method is likely AD-differentiable, so it would be nice to do that (in a similar way to SDEs). However, I wouldn’t expect anything doing fancy aggregations to be AD-differentiable because of how it has to break the link between the rate computation and the choice of reaction.

@isaacsas this would be interesting to look more into.