I am having a strange problem with ForwardDiff that is internally used in Optim. The minimum returned by Optim is not the same value as the value of the objective function when the minimizer returned by Optim is used, see the following MWE.
du = dx = p*u - u*u
du = dy = -3*u + u*u
u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = [1.5]
prob = ODEProblem(f,u0,tspan,p)
tstops = range(0.,stop=10.,length=10)
sol = solve(prob,Tsit5(),saveat=tstops)
randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(tstops)])
data = convert(Array,randomized)
_prob = remake(prob, u0=convert.(eltype(x),prob.u0),p=x)
sol = solve(_prob,Tsit5(),saveat=tstops)
sum((hcat(sol.u...) .- data).^2)
result = optimize(least_squares, [5.], Newton(),autodiff=:forward)
result.minimum # returns 275.97
least_squares(result.minimizer) # returns 276.68
Please read the first part of this post: Please read: make it easier to help you.
You should provide a minimal working example.
I have added a MWE, see the original post above.
It might be related to the DifferentialEquations pkg as well. I have not seen it in an optimization without the solving of differential equations.
Solving with dual numbers can change the stepping behavior in order to do norm control on the derivative terms, which in turn would cause this.
Thanks for the hint!
I have switche to
NLopt and used
ForwardDiff for the calculation of the gradient likewise. However, I do not observe the same problem there.
A further example taken from the docs evaluating the gradient via
They return different results. The difference is not necessarily negligible. I was expecting the same result as they both use AD. Is this related to dual numbers as well, and which gradient is more to trust?
using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots
function lotka_volterra!(du, u, p, t)
x, y = u
α, β, δ, γ = p
du = dx = α*x - β*x*y
du = dy = -δ*y + γ*x*y
u0 = [1.0, 1.0]
tspan = (0.0, 10.0)
tsteps = 0.0:0.1:10.0
p = [1.5, 1.0, 3.0, 1.0]
prob = ODEProblem(lotka_volterra!, u0, tspan, p)
sol = solve(prob, Tsit5(), p=p, saveat = tsteps)
loss = sum(abs2, sol.-1)
return loss, sol
using Zygote, ForwardDiff
Zygote.gradient(x->loss(x),p) # returns [60.889672029001524, -788.8395410840595, 878.197470730419, -2937.299618283181]
ForwardDiff.gradient(x->loss(x),p) # returns [50.90171758717276, -785.0077963154041, 879.7576203004646, -2947.695409344049]
Use a lower solver tolerance?