I am pretty new to Julia and SciML. I have a question about how to specify the maximum iterations in Optimzation.jl. I read the documentation and tried different combinations of maxiters, iterations, etc. So far it does not work. For example, in the following line of code, I tried to set maxiters=400 and iterrations=500. But the optimizer ran to about 1000 iterations so far. These keyward arguments seem ignored. I must understood the documentation wrong some how.

res = Optimization.solve(optprob, Optim.BFGS(), maxiters=400;iterations=500)

The (L-)BFGS - Optim.jl (julianlsolvers.github.io) solver requires the gradient to be calculated at every step. Calculating the gradient requires an additional evaluation of the function being minimized to inform which direction the next guess should be in. So, if you set a particular number N of iterations of the optimization process and use a gradient-required solver, you should expect that function to be evaluated 2N times.

Thanks for you quick reply. I am still confused. For (L-)BFGS, the optimization formula is

x_{n+1} = x_{n} - P^{-1} \nabla f(x_n)

What I really want is to put a upper limit on n. My ODE problem f(x) is very expensive to solve. Sometime I just want to get some â€śgood enoughâ€ť parameter estimation after say n = 100 iterations. â€śmaxiterâ€ť does not seem to work.

That formula requires knowledge of \nabla f(x_n), the gradient of your function at some particular parameter x_n. So, if the solver is currently situated at some point x_n in parameter space, it needs to calculate the gradient at its local position in order to determine where x_{n+1} should be. How does the solver calculate this gradient? Commonly:

Most solvers allow another gradient function \nabla f to be explicitly provided by the user if you have one that is efficient to compute. Using this, each iteration requires one evaluation of f and one evaluation of the provided \nabla f.

A finite difference method could be used to estimate \nabla f, but now you need to compute at minimum f(x_n) and f(x_n+\Delta x) just to get an estimate of the gradient. Using this, each iteration requires at least 3 evaluations of f.

The default option: use automatic differentiation methods to calculate the actual \nabla f (to approximately within machine precision) for the cost of only a single additional evaluation of f. Using this, each iteration requires only 2 evaluations of f.

If this second evaluation of f is too costly for you, then youâ€™ll need to either figure out a way to compute the gradient more efficiently, or consider selecting an optimization algorithm that doesnâ€™t require gradient: see the list of Gradient-Free algorithms on the left-side column here.

# Update current position
state.dx .= state.alpha.*state.s
state.x .= state.x .+ state.dx

which is more or less the equation that you wrote. So this part is controlled by options.iterations, since the options iterations and maxiters can overwrite the same variable, depending on the solver that you use, which you can see on you OptimizationOptimJL installed package, in the function __map_optimizer_args. For this particular solver the final value in options.iterations is given by maxiters.