I’m using Optimization.jl and Optim.jl to solve an optimization problem for which I have gradient information (via AD). What I’ve found is that the methods that use the gradient information perform worse than the derivative-free methods. The gradient-based methods seem to get stuck, despite a large gradient value.

Why is this? I would expect that the gradient methods could get stuck if they find a point in parameter space where the derivative is zero, but here it’s not. If they’re not moving, is it something with the line-search portion of the algorithm that is deciding to take an infinitesimally small step?

To give a snippet of the run script,

```
optf = OptimizationFunction(quasisym, Optimization.AutoForwardDiff());
prob_x0 = OptimizationProblem(optf, x0, params);
qs_x0 = quasisym(x0,params)
@time sol_x0_nm = solve(prob_x0, NelderMead(); maxtime = 3600);
@time sol_x0_bfgs = solve(prob_x0, BFGS(); maxtime = 3600);
@time sol_x0_lbfgs = solve(prob_x0, LBFGS(); maxtime = 3600);
```

This produces the following solutions. I’ve also included a plot at the bottom, showing that Nelder-Mead is the only method that moves significantly.

```
julia> qs_x0
2.193829699196393
julia> sol_x0_nm.original
* Status: failure (reached maximum number of iterations)
* Candidate solution
Final objective value: 2.465799e-03
* Found with
Algorithm: Nelder-Mead
* Convergence measures
√(Σ(yᵢ-y)²)/n ≰ 1.0e-08
* Work counters
Seconds run: 1590 (vs limit 3600)
Iterations: 1000
f(x) calls: 5812
julia> sol_x0_bfgs.original
* Status: success
* Candidate solution
Final objective value: 2.166061e+00
* Found with
Algorithm: BFGS
* Convergence measures
|x - x'| = 5.29e-23 ≰ 0.0e+00
|x - x'|/|x'| = 5.29e-23 ≰ 0.0e+00
|f(x) - f(x')| = 0.00e+00 ≤ 0.0e+00
|f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
|g(x)| = 3.30e+05 ≰ 1.0e-08
* Work counters
Seconds run: 3569 (vs limit 3600)
Iterations: 8
f(x) calls: 360
∇f(x) calls: 360
julia> sol_x0_lbfgs.original
* Status: success
* Candidate solution
Final objective value: 2.189750e+00
* Found with
Algorithm: L-BFGS
* Convergence measures
|x - x'| = 8.27e-25 ≰ 0.0e+00
|x - x'|/|x'| = 8.27e-25 ≰ 0.0e+00
|f(x) - f(x')| = 0.00e+00 ≤ 0.0e+00
|f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
|g(x)| = 8.81e+05 ≰ 1.0e-08
* Work counters
Seconds run: 2905 (vs limit 3600)
Iterations: 7
f(x) calls: 312
∇f(x) calls: 312
```