I wanted to ask if the DifferentialEquations.jl package for Julia implements discrete adjoint sensitivity analysis for adaptive explicit Runge-Kutta methods.
If so, where can I find some documentation describing their algorithm?
If you just use sensealg = ReverseDiffAdjoint() then it’s the discrete adjoint sensitivity via ReverseDiff. Tracker will work as well. And an Enzyme one is almost working.
I don’t use Julia myself, so I was more interested in a description of the scheme they use for discrete adjoint sensitivity analysis. For example, in DENSERKS (denserks) they describe the continuous adjoint sensitivity analysis for ERK methods. FATODE implements the discrete adjoint method for ERK and they also provide a description of the scheme.
Basically, there are many different adjoint forms allowed. One type of adjoint is discrete sensitivity analysis methods, which are computationally equivalent to applying AD to the solver. ForwardDiff of the solver is equivalent to discrete forward sensitivity analysis, and ReverseDiffAdjoint, TrackerAdjoint, etc. are equivalent to discrete adjoint sensitivity analysis. If you look at the discrete adjoint equations and the AD transformation there isn’t really anything to prove since you can see it’s just the successive vjp application, and IIRC the FATODE papers do discuss this as well.
Thus because AD exists within the Julia programming language, there is no need to directly define per-algorithm discrete adjoint methods since we just use AD to generate the same equations. There is a potential performance thing that can occur though, hence the discussion on whether implementing it would decrease overhead over Zygote (Analytic Discrete Adjoint · Issue #556 · SciML/SciMLSensitivity.jl · GitHub), though with the advent of Enzyme the performance issues have been pretty much nullified but we still need to integrate a EnzymeAdjoint option into the system.
I was already following the first article you sent. There, the authors explain the continuous forward and adjoint methods, but not the discrete adjoint method. Is it because what is understood by discrete adjoint method is just applying AD to the solver?
From what I’ve read, and the idea I got from that article as well, it that this refers to deriving the adjoint equations by hand from the forward discrete solver equations, and then applying AD to compute vector-Jacobian (of the RHS) products, is that right? So, since we have to derive the adjoint equations by hand first, the implementation is dependant on the class of the solver. For example, in PETSC TSADJOINT, they seem to follow this approach: “A drawback with TSAdjoint is that the adjoint of each timestepping algorithm must be implemented by the library developers”.
A way that it might be solver independent ( at least for non-multistep solvers imo) is if we don’t care how the step is done and instead express the adjoint equations in terms of the Jacobian of the full step (d u(n+1)/du(n)).
So, which of these approaches is actually used by Julia currently?
I see from your answer in (Analytic Discrete Adjoint · Issue #556 that currently Julia performs AD on the entire solver right? I’m not sure if this has an advantage as well, but I invested way too much time into this not to find out now ahah.
I’m a PhD student in Portugal and I’ve been working on implementing the discrete adjoint sensitivity analysis method for ERK methods in C++ with an auto diff library developed by my group. I’m writing an article about this work and I wanted to know if DifferentialEquations.jl already does this.
Thank you again for the detailed answer and for your time.
Yes exactly. We could implement discrete adjoints per solvers, but it’s mathematically equivalent to the calculation that is done by applying AD to the solver. So ReverseDiffAdjoint and discrete adjoint of Tsit5 are the same thing, so therefore we do not implement the hand adjoint because it’s less code to get to the same place. So what Julia does is that when you differentiate the solver, it does it via whatever the sensealg algorithm is. Some of the choices are discrete adjoints (ReverseDiffAdjoint, TrackerAdjoint), some of them are continuous adjoints (InterpolatingAdjoint, BacksolveAdjoint, etc.).
The full table of adjoint methods with trade-offs is:
then there’s a 3rd discrete adjoint method (EnzymeAdjoint) coming pretty soon, and a new continuous checkpointed adjoint which will have the manuscript and paper come soon, but the big point there is lower memory usage with good stability. “Good stability” are methods which are stable continuous adjoints, which of course are not as stable as discrete adjoint methods (which gets “Best”) but at least are not numerically unstable like BacksolveAdjoint (see the Stiff Neural Ordinary Differential Equations paper for details on that).
For forward mode, ForwardSensitivity is continuous forward sensitivity analysis and ForwardDiffSensitivity is discrete forward sensitivity analysis.