Generic code, iterative methods and automatic differentiation

Bernard_GODARD · September 26, 2018, 7:17pm

In Julia it is quite easy to write generic code that can be used with many numbers type (eg BigFloat) or many automatic differentiation algorithms. However if this code contains some iterative parts eg such as:
find t such that g(x(t,p),p) = 0 where x is some N-dimensional trajectory possibly numerically integrated, t is the independent variable of the trajectory (eg time) and p is a parameter vector, it seems a pity to use the differentiation scheme in all the iterations since the derivative can be obtained from the solution only:

\frac{dt}{dp} = \frac { - \frac{\delta g}{\delta x} \frac{dx}{dp} - \frac{\delta g}{\delta p} } { \frac{\delta g}{\delta x} \frac{dx}{dt} + \frac{\delta g}{\delta t} }

eg find the solution without derivatives. Then with the solution and using an autodiff scheme, obtain all the derivatives

\frac{\delta g}{\delta x}, \frac{dx}{dp}, \frac{\delta g}{\delta p}, \frac{dx}{dt}, \frac{\delta g}{\delta t}

and then compute

\frac{dt}{dp}

But then this part of the code is not generic anymore. I am wondering how people deal with these kind of problems in large modelling applications.

Thank you.

ChrisRackauckas · September 26, 2018, 9:02pm

This is very close to the difference between continuous sensitivity analysis methods ( http://docs.juliadiffeq.org/latest/analysis/sensitivity.html ) and discrete sensitivity analysis (AD). Continuous sensitivity analysis just builds another ODE while discrete sensitivity analysis propagates derivatives through the code (AD). We will be putting a paper out soon that shows that, unless the ODE is derived at compile time, discrete sensitivity analysis is much much faster.

That said, this means that we will want to continue to optimize discrete sensitivity analysis, and this piece right here is an optimization that can be done in the implicit solvers. It can be done almost generically. I mentioned in the Slack the other day that technically it cannot be done generically, but only because ForwardDiff.value is a different function from ReverseDiff.value which is a different function from Measurements.value etc. If a generic value function was in Julia Base, this could be written down. For now, it’s a smallish optimization so we aren’t worried about it, but it is something we will add to all of DifferentialEquations.jl when the time comes.

It’s still generic, it’s just optimized. Even if you add specific handling for ForwardDiff.Dual it’s still generic, just better at handling standard cases. That is something to keep in mind with Julia code: multiple dispatch is about building the generic code and optimizing based on datatypes when you want to, not when you have to.

More generally, this idea of being smart with generic handling and separating the true continuous solution from the “control” parts of code, both for AD and other applications like Measurements, is something that we are investigating quite deeply. It brings a whole new element to optimizing generic programming.

github.com/SciML/DiffEqBase.jl

Fix sensitivities and don't require time to be dual valued

SciML:master ← SciML:time_insensitive

opened 10:16PM - 22 Sep 18 UTC

ChrisRackauckas

+6 -29

This deserves an explanation. While fixing https://github.com/JuliaDiffEq/DiffEq…Sensitivity.jl/issues/24 I noticed that using dual numbers to get sensitivities worked with `saveat` but could give incorrect values on the directly saved values. This seemed odd, and I tracked it down to being due to adaptive time stepping by turning off adaptivity and seeing it go away. `saveat` values have no sensitivity/uncertainty in time, so that made sense as well. But to fix it, I implemented the change of this PR. What this essentially does is it separates time from the dual valued state space. This has been a long running noted oddity that in order to differentiate you needed to set time to dual numbers as well (though this issue points out it was only correct with `saveat`, which all parameter fitting used so it went unnoticed before). This has always been troubling since the ODE solver itself is not a differentiable algorithm, but it kind of worked. The way that https://github.com/JuliaDiffEq/DiffEqSensitivity.jl/issues/24 failed though is that the sensitivities were correct until the first step rejection, showing that the non-differentiability was indeed an issue. However, the adaptive algorithm's only input from state is the norm of the error. Basically you can think of it as, given the norm of the error calculate what the new time steps should be (or whether the current step should be rejected and reduced, the non-differentiable part). If you think of the time steps as given, then the solver with pre-determined time steps is a differentiable program. By declaring the norm values to be not related to the sensitivities, they are in a sense given without reference to the dual numbers and thus the program for the state values is differentiable and this all works. It's a little bit more difficult than that because we can use a norm which for example adds the norm of the dual and value parts. As long as said norm isn't a dual number it's fine. You can think of this as a differentiable program with a control, and the duals shouldn't propogate through the controls which just pick the next `dt`, and it can or cannot use the duals in said controls as it pleases since the true solution is independent of the choice `dt`. Thus in some sense, this program, the full ODE solver, lives in a space that's larger than the set of differentiable programs but is ADable. I didn't realize that before but it's kind of cool. Anyways, the following code now works: ```julia using ForwardDiff using ForwardDiff: Partials, Dual using OrdinaryDiffEq using Plots using Calculus #forward differentiation function func(du,u,p,t) du[1] = p[1] * u[1] - p[2] * u[1]*u[2] du[2] = -3 * u[2] + u[1]*u[2] end p1 = 1.5 p2 = 1.0 u0 = [1.0, 1.0] tspan = (0.0,10.0) p = [p1, p2] p1dual = Dual{Float64}(p1, (1.0, 0.0)) p2dual = Dual{Float64}(p2, (0.0, 1.0)) pdual = [p1dual, p2dual] prob_dual = ODEProblem(func,eltype(pdual).(u0),tspan,pdual) sol_dual = solve(prob_dual,Tsit5(),saveat=0.0:0.1:10.0, reltol = 1e-15) sol_dual2 = solve(prob_dual,Tsit5(), reltol = 1e-15) timepoints = [i for i in sol_dual.t] sensitivity_forward_diff = [i[1].partials.values[1] for i in sol_dual.u] Plots.plot(timepoints,sensitivity_forward_diff) timepoints = [i for i in sol_dual2.t] sensitivity_forward_diff = [i[1].partials.values[1] for i in sol_dual2.u] Plots.plot!(timepoints,sensitivity_forward_diff) using ParameterizedFunctions # sensitivity ODE f_ode_sen = @ode_def_nohes test_sensitivity begin du1 = p1 * u1 - p2 * u1*u2 du2 = -3 * u2 + u1*u2 end p1 p2 prob_ode_sen = ODELocalSensitivityProblem(f_ode_sen,u0,tspan,p) sol_ode_state_and_sen = solve(prob_ode_sen,Tsit5(),reltol = 1e-9) timepoints2 = [i for i in sol_ode_state_and_sen.t] sensitivity_ode_sol = [i[3] for i in sol_ode_state_and_sen.u] state_ode_sol = [i[3] for i in sol_ode_state_and_sen.u] Plots.plot!(timepoints2,sensitivity_ode_sol) ``` ![sensitivity_plot](https://user-images.githubusercontent.com/1814174/45922170-9c747c00-be79-11e8-94cb-be5bb4b53ac1.png) Time doesn't need to be made dual in order for the solver to work with things like ForwardDiff. This should also make things easier for a lot of users. Additionally, Measurements now also work without time-based measurements as well: ```julia using OrdinaryDiffEq, Measurements # Half-life of radiocarbon, in thousands of years c = 5.730 ± 0.040 #Setup u₀ = 1 ± 0 tspan = (0.0, 1.0) #Define the problem radioactivedecay(u,p,t) = -c * u #Pass to solver prob = ODEProblem(radioactivedecay, u₀, tspan) sol = solve(prob, Tsit5()) ``` Pinging all of those who will care about this change and its discussion. @tkoolen @jrevels @ArnoJL @giordano @SebastianM-C & https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/419, @dkarrasch & https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/225, https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/202, @YingboMa , @Vaibhavdixit02 .

Bernard_GODARD · September 27, 2018, 2:18pm

Thank you for your reply.

There is a typo in your doc. The left hand side of the ODE is using state variable u while the right hand side uses y.

Interesting. We are using both techniques at my workplace and AD is much slower but I cannot really compare specifics because the 2 pieces of code (one in Fortran, the other in C++) are different in so many other ways.
In your comparison are you using AD to obtain the coefficients of the linear sensitivity ODE (if so I guess the same AD technique against which you are benchmarking?)
In which journal will you be publishing?

ChrisRackauckas · September 28, 2018, 4:37pm

Yes, we are using ForwardDiff to generate the Jacobian if the user doesn’t supply one. You can take a look at our code here:

https://github.com/JuliaDiffEq/DiffEqSensitivity.jl/blob/master/src/local_sensitivity.jl#L36-L66

and it ties into DiffEqDiffTools. We’re going to take one last pass through checking for optimizations before really committing to the conclusion, but so far haven’t found anything.

Hopefully it gets done by then!

Topic		Replies	Views
Discrete Adjoint Sensitivity Analysis for ODES in DifferentialEquations.jl New to Julia	7	749	March 3, 2025
Error in DiffEqSensitivity: no method for similar of Duals Numerics question	5	318	March 17, 2023
ForwardDiff + Adaptive ODE Solvers: Timestep Issue Leads to Incorrect Derivatives Numerics forwarddiff , ordinarydiffeq	5	198	February 14, 2025
Discrepancy of ODE Sensitivity Analysis paper results with benchmarks Numerics benchmark , ode , sciml	31	404	June 18, 2025
How to autodifferentiate the results of NLsolve? General Usage autodiff	24	2487	March 8, 2023

Generic code, iterative methods and automatic differentiation

Related topics