How to properly specify maximum interations in Optimization?

Hello,

I am pretty new to Julia and SciML. I have a question about how to specify the maximum iterations in Optimzation.jl. I read the documentation and tried different combinations of maxiters, iterations, etc. So far it does not work. For example, in the following line of code, I tried to set maxiters=400 and iterrations=500. But the optimizer ran to about 1000 iterations so far. These keyward arguments seem ignored. I must understood the documentation wrong some how.

res = Optimization.solve(optprob, Optim.BFGS(), maxiters=400;iterations=500)

Thanks.

The (L-)BFGS - Optim.jl (julianlsolvers.github.io) solver requires the gradient to be calculated at every step. Calculating the gradient requires an additional evaluation of the function being minimized to inform which direction the next guess should be in. So, if you set a particular number N of iterations of the optimization process and use a gradient-required solver, you should expect that function to be evaluated 2N times.

1 Like

Thanks for you quick reply. I am still confused. For (L-)BFGS, the optimization formula is

x_{n+1} = x_{n} - P^{-1} \nabla f(x_n)

What I really want is to put a upper limit on n. My ODE problem f(x) is very expensive to solve. Sometime I just want to get some “good enough” parameter estimation after say n = 100 iterations. “maxiter” does not seem to work.

That formula requires knowledge of \nabla f(x_n), the gradient of your function at some particular parameter x_n. So, if the solver is currently situated at some point x_n in parameter space, it needs to calculate the gradient at its local position in order to determine where x_{n+1} should be. How does the solver calculate this gradient? Commonly:

  • Most solvers allow another gradient function \nabla f to be explicitly provided by the user if you have one that is efficient to compute. Using this, each iteration requires one evaluation of f and one evaluation of the provided \nabla f.
  • A finite difference method could be used to estimate \nabla f, but now you need to compute at minimum f(x_n) and f(x_n+\Delta x) just to get an estimate of the gradient. Using this, each iteration requires at least 3 evaluations of f.
  • The default option: use automatic differentiation methods to calculate the actual \nabla f (to approximately within machine precision) for the cost of only a single additional evaluation of f. Using this, each iteration requires only 2 evaluations of f.

If this second evaluation of f is too costly for you, then you’ll need to either figure out a way to compute the gradient more efficiently, or consider selecting an optimization algorithm that doesn’t require gradient: see the list of Gradient-Free algorithms on the left-side column here.

I think mike is correct.
But to elaborate more the answer. I believe that you are using Optim.jl, so you can see the code here Optim.jl/src/multivariate/optimize/optimize.jl at master · JuliaNLSolvers/Optim.jl · GitHub around line 52

    while !converged && !stopped && iteration < options.iterations
        iteration += 1
        ls_success = !update_state!(d, state, method)

and the update_state! for the solver that you are using, in this case BFGS Optim.jl/src/multivariate/solvers/first_order/bfgs.jl at master · JuliaNLSolvers/Optim.jl · GitHub around line 141

    # Update current position
    state.dx .= state.alpha.*state.s
    state.x .= state.x .+ state.dx

which is more or less the equation that you wrote. So this part is controlled by options.iterations, since the options iterations and maxiters can overwrite the same variable, depending on the solver that you use, which you can see on you OptimizationOptimJL installed package, in the function __map_optimizer_args. For this particular solver the final value in options.iterations is given by maxiters.

Now, I think the arguments that you care about are f_calls_limit, g_calls_limit, h_calls_limit, that are related to mike’s answer. Which you can see them in action here Optim.jl/src/multivariate/optimize/optimize.jl at master · JuliaNLSolvers/Optim.jl · GitHub around line 79

        f_limit_reached = options.f_calls_limit > 0 && f_calls(d) >= options.f_calls_limit ? true : false
        g_limit_reached = options.g_calls_limit > 0 && g_calls(d) >= options.g_calls_limit ? true : false
        h_limit_reached = options.h_calls_limit > 0 && h_calls(d) >= options.h_calls_limit ? true : false

So do something like

res = solve(prob, Optim.BFGS(), maxiters=400; f_calls_limit=2000)
1 Like