Yes, effectively. DifferentialEquations.jl defaults to using a norm hack to include the duals in the adaptivity norm in order to give you the same time stepping as if applied to the forward sensitivity equations, i.e. differentiate-then-discretize. You can change the norm to get the standard AD discretize-then-differentiate behavior, though for adaptive integrators there are some strong reasons to prefer the former which is the reason for the default.
Yes that’s what I showed in the talk and linked above:
sol = solve(prob, alg, internalnorm = (u,t)->norm(u))
I don’t think that’s necessarily the issue. The issue is that adaptivity is always done with respect to some error metric and is also crucial to the stability of the integrator. What can go wrong is that, if the adaptivity scheme is only dependent on the original process, you have no guarantee of correctness, or even stability or convergence, of the derivative.
In the paper and in the talk I give a linear ODE system where as you send abstol->0, reltol->0
, your derivative estimator error for discretize-then-differentiate does not converge to zero because the adaptivity of the primal equation is unaffected. This is an extreme case, but it highlights that with discretize-then-differentiate you might be able to say:
“this solver is adapting time steps based on tol = X
and so it’s automatically choosing some good time steps to give me a relative error I want”,
but you cannot say
“this solver is adaptive time steps based on tol = X
and so it’s automatically choosing good time steps to give me a relative error in the derivative I want”
because, again, there are examples show where you can say tol = 1e-300
and still get a derivative error of 100, without getting any warning or any error. Decoupling the adaptivity of the solver from the derivative calculation is simply not guaranteed to converge. And now while the bad case might be bad (the cases that are not convergent you might say are rare), the more troubling part of that is that even in “not as bad” cases, i.e. cases where you still converge to the correct derivative if tolerance gets low enough, but at standard solver tolerances the derivative ends up doing something bad to give an unreliable result. This is because one of the main purposes of time stepping adaptivity is to ensure that your ODE solve is stable, i.e. choosing a time step small enough so that you don’t have an instability. With the normal function of the solver, if it detects instability in a step and then reject that step and pull back to use a smaller dt
. But if you don’t include the derivative values in that adaptivity norm, then it’s possible to get one “bad” derivative step that’s too high and then the rest of your ODE solver is just giving you junk. With discretize-then-differentiate, there are also simple cases you can construct where the primal ODE looks fine, but the derivative ODE has dt
in the unstable range and thus give you some derivative of 100,000,000,000 without any warning or error.
However, if you do this change to making the derivative included in the adaptivity norm by default, then you are guaranteed to have a stable derivative calculation (or the solver will error out with a return code), you have convergence as tol -> 0
, and your tolerance interpretation also means something about the accuracy of the derivative. This is a choice of course, you can can easily with one line of code swap from one to the other, but my view is that if I’m offering an adaptive solver that’s supposed to try and be as correct as possible automatically, that is more correct behavior.