ForwardDiff evaluation of objective in Optim returns wrong result (objective uses DifferentialEquations pkg)

I am having a strange problem with ForwardDiff that is internally used in Optim. The minimum returned by Optim is not the same value as the value of the objective function when the minimizer returned by Optim is used, see the following MWE.

using Julia
using DifferentialEquations

function f(du,u,p,t)
  du[1] = dx = p[1]*u[1] - u[1]*u[2]
  du[2] = dy = -3*u[2] + u[1]*u[2]

u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = [1.5]
prob = ODEProblem(f,u0,tspan,p)
tstops = range(0.,stop=10.,length=10)
sol = solve(prob,Tsit5(),saveat=tstops)

randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(tstops)])
data = convert(Array,randomized)

function least_squares(x)
           _prob = remake(prob, u0=convert.(eltype(x),prob.u0),p=x)
           sol = solve(_prob,Tsit5(),saveat=tstops)
           sum((hcat(sol.u...) .- data).^2)

result = optimize(least_squares, [5.], Newton(),autodiff=:forward)

result.minimum # returns 275.97
least_squares(result.minimizer) # returns 276.68

Please read the first part of this post: Please read: make it easier to help you.

You should provide a minimal working example.

I have added a MWE, see the original post above.

It might be related to the DifferentialEquations pkg as well. I have not seen it in an optimization without the solving of differential equations.

Solving with dual numbers can change the stepping behavior in order to do norm control on the derivative terms, which in turn would cause this.

Thanks for the hint!

I have switche to NLopt and used ForwardDiff for the calculation of the gradient likewise. However, I do not observe the same problem there.

A further example taken from the docs evaluating the gradient via ForwardDiff and Zygote.
They return different results. The difference is not necessarily negligible. I was expecting the same result as they both use AD. Is this related to dual numbers as well, and which gradient is more to trust?

using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots

function lotka_volterra!(du, u, p, t)
     x, y = u
     α, β, δ, γ = p
     du[1] = dx = α*x - β*x*y
     du[2] = dy = -δ*y + γ*x*y
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
tsteps = 0.0:0.1:10.0
p = [1.5, 1.0, 3.0, 1.0]
prob = ODEProblem(lotka_volterra!, u0, tspan, p)
function loss(p)
     sol = solve(prob, Tsit5(), p=p, saveat = tsteps)
     loss = sum(abs2, sol.-1)
     return loss, sol
using Zygote, ForwardDiff
Zygote.gradient(x->loss(x)[1],p) # returns [60.889672029001524, -788.8395410840595, 878.197470730419, -2937.299618283181]
ForwardDiff.gradient(x->loss(x)[1],p) # returns [50.90171758717276, -785.0077963154041, 879.7576203004646, -2947.695409344049]

Use a lower solver tolerance?

1 Like