Hello dear Julia community,
TLDR; Passing dual numbers (ForwardDiff.jl
) into an adaptative-step ODE solver (OrdinaryDiffEq.jl
) affects the timestep choice, making the derivative incorrect.
Let’s say we have a function whose value requires solving an ode, for example f(u_0) = u(u_0, 1), where u(u_0, t) is obtained by integrating \dot u = u from t = 0 to t = t. If we want to do some optimization/shooting, we may want to compute the derivative of this function w.r.t. u_0.
The solution to the ODE is u(t) = u_0 e^t, therefore f(u_0) = u_0 e^1 = e u_0, and f'(u_0) = e. We will choose u_0 = 1 so that f(u_0) = e and f'(u_0) = e.
One simple way to do this is using ForwardDiff.jl
and its high-level interface, e.g. ForwardDiff.jacobian
in this case. It computes derivatives by using dual numbers, which are passed to the function to differentiate. On the contrary of finite differences, ForwardDiff.jl
provides exact derivatives of the given function. But for this to work properly, the control flow inside the function should not depend on the dual part, and in this case it does: the timesteps choosen by the integrator change depending on whether there is a dual part, and on its value if applicable. In other words, FowardDiff.jl
is not differentiating the expected function, but a slightly different one.
MRE:
using Pkg
Pkg.activate(; temp=true)
Pkg.add(["OrdinaryDiffEq", "ForwardDiff"])
using OrdinaryDiffEq, ForwardDiff
# s gives the whole solution to the ODE problem
# f takes the value of the solution at the final time
# (s(u0, 1)[end] != s(u0, 1)(1)... for another time)
u(u0, t) = solve(ODEProblem((du, u, p, t) -> (du .= u), u0, t, ()))(t)
f(u0) = u(u0, 1)
u0 = Float64[1]
f_val = f(u0)
f_grad = ForwardDiff.jacobian(f, u0)
@show only(f_val) - only(f_grad) # Should be zero but is not
Further investigations:
- Building by hand the dual numbers gives the same result as
ForwardDiff.jacobian
:
cfg = ForwardDiff.JacobianConfig(f, u0)
dual_u0(u0, seed) = eltype(cfg.duals).(u0, seed)
u0_dual = dual_u0(u0, cfg.seeds)
f_dual = f(dual_u0(u0, cfg.seeds))
@show only(f_dual).value - only(f_dual).partials[1] # Is zero
@show only(f_grad) - only(f_dual).partials[1] # Is zero
- Using
u
, we can see the timesteps are different depending on the input, even if the real part is the same:
@show s(u0).t[2] - s(u0_dual).t[2] # Should be zero but is not
- Changing the seed (x100) also changes the timesteps:
@show s(u0_dual).t[2] - s(dual_u0(u0, 100 .* cfg.seeds)).t[2]
- Setting the seed to zero doesn’t isn’t equivalent to having no seed:
@show s(u0).t[2] - s(dual_u0(u0, 0 .* cfg.seeds)).t[2]
For now I am avoiding this problem by using constant timesteps (solve(prob, Tsit5(); dapatative=false, dt=1e-2)
), but I would prefer if the process for choosing the next timestep wouldn’t depend on the dual part. Also, a possibly more problematic consequence, is that some ODEs may become stiff when calculating their derivative.
Is this known — and expected behavior? Should I make an issue on OrdinaryDiffEq.jl
?
Thank you,
abavoil