ForwardDiff evaluation of objective in Optim returns wrong result (objective uses DifferentialEquations pkg)

moesphere · February 4, 2021, 3:44pm

I am having a strange problem with ForwardDiff that is internally used in Optim. The minimum returned by Optim is not the same value as the value of the objective function when the minimizer returned by Optim is used, see the following MWE.

using Julia
using DifferentialEquations

function f(du,u,p,t)
  du[1] = dx = p[1]*u[1] - u[1]*u[2]
  du[2] = dy = -3*u[2] + u[1]*u[2]
end

u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = [1.5]
prob = ODEProblem(f,u0,tspan,p)
tstops = range(0.,stop=10.,length=10)
sol = solve(prob,Tsit5(),saveat=tstops)

randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(tstops)])
data = convert(Array,randomized)

function least_squares(x)
           _prob = remake(prob, u0=convert.(eltype(x),prob.u0),p=x)
           sol = solve(_prob,Tsit5(),saveat=tstops)
           sum((hcat(sol.u...) .- data).^2)
end

result = optimize(least_squares, [5.], Newton(),autodiff=:forward)

result.minimum # returns 275.97
least_squares(result.minimizer) # returns 276.68

odow · February 4, 2021, 6:46pm

Please read the first part of this post: Please read: make it easier to help you.

You should provide a minimal working example.

moesphere · February 5, 2021, 12:32pm

I have added a MWE, see the original post above.

moesphere · February 6, 2021, 1:28pm

It might be related to the DifferentialEquations pkg as well. I have not seen it in an optimization without the solving of differential equations.

ChrisRackauckas · February 6, 2021, 4:01pm

Solving with dual numbers can change the stepping behavior in order to do norm control on the derivative terms, which in turn would cause this.

moesphere · February 7, 2021, 4:37pm

Thanks for the hint!

I have switche to NLopt and used ForwardDiff for the calculation of the gradient likewise. However, I do not observe the same problem there.

moesphere · March 10, 2021, 3:10pm

A further example taken from the docs evaluating the gradient via ForwardDiff and Zygote.
They return different results. The difference is not necessarily negligible. I was expecting the same result as they both use AD. Is this related to dual numbers as well, and which gradient is more to trust?

using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots

function lotka_volterra!(du, u, p, t)
     x, y = u
     α, β, δ, γ = p
     du[1] = dx = α*x - β*x*y
     du[2] = dy = -δ*y + γ*x*y
end
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
tsteps = 0.0:0.1:10.0
p = [1.5, 1.0, 3.0, 1.0]
prob = ODEProblem(lotka_volterra!, u0, tspan, p)
function loss(p)
     sol = solve(prob, Tsit5(), p=p, saveat = tsteps)
     loss = sum(abs2, sol.-1)
     return loss, sol
end
using Zygote, ForwardDiff
Zygote.gradient(x->loss(x)[1],p) # returns [60.889672029001524, -788.8395410840595, 878.197470730419, -2937.299618283181]
ForwardDiff.gradient(x->loss(x)[1],p) # returns [50.90171758717276, -785.0077963154041, 879.7576203004646, -2947.695409344049]

ChrisRackauckas · March 10, 2021, 3:38pm

Use a lower solver tolerance?

Topic		Replies	Views
Unknown error from ForwardDiff Optimization (Mathematical)	1	470	December 10, 2019
Using ForwardDiff to differentiate an ODE with respect to some of its parameters Modelling & Simulations	5	624	April 10, 2020
How to use ForwardDiff with NLopt? Optimization (Mathematical) optimization , forwarddiff	9	2053	May 15, 2022
How to troubleshoot ForwardDiff Optimization (Mathematical)	18	505	June 12, 2023
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3153	August 22, 2017

ForwardDiff evaluation of objective in Optim returns wrong result (objective uses DifferentialEquations pkg)

Related topics