DifferentialEquations - Derivatives in ODE function/ nesting AD

Andreas_Schlaginhauf · November 13, 2020, 5:33pm

If the ODE function of a neural ODE includes the gradient f’(x) of a neural network f(x), we need to calculate second order gradients of f(x) in the backward pass. For me this led to all sorts of problems when the AD method is not carefully selected.

Assume the ODE function is g(f(x), f’(x)), where f(x) is again a neural network and g(y) some simple (let’s say rational) function. Any suggestions about the choice of AD method and library for both the calculation of f’(x) and the second order gradients in the adjoint method?

So far, the only combination which worked for me was to use Zygote.gradient() for f’(x) in the forward pass and an optimize-then-discretize adjoint method (such as BacksolveAdjoint, InterpolatingAdjoint) in combination with autojacvec=ZygoteVJP(). However, I could not yet find a working method for the differentiation through the ODE solver (ReverseDiffAdjoint, TrackerAdjoint, `ZygoteAdjoint() all failed). This is especially a problem for DDEs, as there is currently no optimize-then-discretize adjoint implemented there.

Which combinations of AD methods could work here (especially for the DDE case)? Also does it make sense to combine forward and backward AD here and if yes, which libraries work well together?

ChrisRackauckas · November 14, 2020, 5:37pm

There’s a lot of details on new automatic differentiation libraries that will be released fairly soon. Just as quick spoilers, the ARPA-E DIFFERENTIATE program (ARPA-E DIFFERENTIATE program) funded three Julia Computing projects which led to massive efforts over the last year on new AD mechanisms. This has led to new compiler tooling in Julia, with the vast majority being set to merge in Julia v1.7, which includes flexible compiler passes to be written from user code. This fixes essentially all of the issues we had with Cassette.jl and IRTools.jl (and thus the issues of Zygote.jl), which is why those libraries are somewhat in maintenance mode. The new AD, Diffractor.jl, will get a full announcement soon (so think of this as just the trailer ) with a full explanation of how these issues were solved and what it’s being currently used and tested on.

With this new AD, there are projects starting up in the Julia Lab which will use the new composable pass structure to add features to the AD, like mutation and MPI support, to solve the issues of integrating AD with scientific computing code (since these issues are distinctly different from machine learning code). We’re also teaming up with people who had solved such issues in C++ and Fortran AD tools before, so that we have the right expertise on the team to do it correctly.

Again this has been a big project with lots of moving parts and it’s not complete yet, but you’ll start to hear announcements on it fairly soon.

Mixing forward and reverse almost always makes sense for higher order, limiting to only one or two reverses. The arguments for that can be found in Griewank’s tome IIRC and it has to do with how the complexity grows.

Andreas_Schlaginhauf · November 15, 2020, 9:21pm

Thanks a lot for the detailed answer. Those projects sound really interesting and it’s good to hear that AD will be further improved soon. This will make the whole SciML libraries even more amazing In the meantime, I’ll give it a try to implement the adjoint method for DDEs.

Ok great, thanks for the hint!

ChrisRackauckas · November 15, 2020, 10:59pm

Even with the new AD, this will be needed because the missing component cannot be calculated without a derivative rule to catch the discontinuity. The issue to follow is:

github.com/SciML/SciMLSensitivity.jl

Forward sensitivity and adjoints for DDEs

opened 09:11PM - 11 Jun 20 UTC

devmotion

For simple cases we can just differentiate through the DDE solver but if the DDE… system contains parameter-dependent C1-discontinuities the forward sensitivities have jump discontinuities which, e.g., ForwardDiff can't deal with (see https://github.com/SciML/DelayDiffEq.jl/pull/183 and https://epubs.siam.org/doi/abs/10.1137/100814949). Hence it would be great if we could add forward sensitivity equations and adjoint methods for DDEs. I would like to help with getting them in here, I've just never worked on DiffEqSensitivity and hence might need some guidance and/or time to get started. I would assume that (hopefully) one can exploit the existing implementations for ODEs.

An implementation of Enright’s corrections is needed regardless of what AD is used, so it would be much appreciated!

Andreas_Schlaginhauf · November 17, 2020, 2:12pm

Ok, I do have a first implementation of Enright’s DDE adjoint method, but at the moment it still produces slightly different gradients, as opposed to ForwardDiffSensitivity and ReverseDiffAdjoint. So there must be some small bug in my code. But once that is fixed I could make a post in the above issue on github.

ChrisRackauckas · November 17, 2020, 2:19pm

Enright’s is going to be different because AD misses the discontinuity terms of the delay lags. That’s why we need it!

Andreas_Schlaginhauf · November 17, 2020, 2:57pm

Ok, but isn’t this only a problem for the discontinuities coming from the possibly not C1 transition between initial history and DDE solution? Since in my example that transition should be smooth… Or is AD also having problems with the discontinuities in the adjoint state?

Topic		Replies	Views
Discrete Adjoint Sensitivity Analysis for ODES in DifferentialEquations.jl New to Julia	7	759	March 3, 2025
Nested and different AD methods altogether: How to add AD calculations inside my loss function when using neural differential equations? Machine Learning sciml , ad , neural-network , differentialequation	9	996	September 28, 2024
Sensitivity fail on DAEs Modelling & Simulations sciml	4	490	May 3, 2023
Odd behaviour regarding Adjoint Sensitivity Analysis Numerics question , sciml , differentialequation	8	114	October 11, 2024
ForwardDiff + Adaptive ODE Solvers: Timestep Issue Leads to Incorrect Derivatives Numerics forwarddiff , ordinarydiffeq	5	199	February 14, 2025

DifferentialEquations - Derivatives in ODE function/ nesting AD

Related topics