A few suggestions:
- Make sure the gradient is correct. A lot of the times that I struggled with gradient-based optimisation algorithms, the gradient was wrong. So define the cost function and check that it differentiates correctly using any AD package and finite difference. There might be an AD bug, unlikely but not impossible.
- Try other algorithms except
BFGS
. If your cost function’s curvature is changing often,BFGS
is likely a bad choice for an algorithm here because it tries to capture “global curvature information” in the approximate inverse Hessian which can be complete gibberish if your cost function’s curvature changes too often.GradientDescent
andConjugateGradient
are 2 alternatives I would try. - Benchmark your function and its gradient and check for type instabilities with
Float64
inputs andForwardDiff.Dual
inputs. It’s possible that your function is type stable when run with 1 input type but not type stable when run with another input type. - Consider using reverse-mode AD to define the gradient if it’s too slow. You can pass the gradient function explicitly to Optim.
- Loosen the tolerance as Chris suggests above and see how loose it’s allowed to be while still converging to a reasonable solution.