Optim and differential equations?

I’m using Optim with the Fminbox() algorithm to tune some model parameters in a model simulated by DifferentialEquations. In total, the model has 14 scaled parameters. The model fit with initial parameter guesses is:

I’ve created a loss function, and vary each parameter individually in the range [0.3,2], leading to:

Based on this plot, parameter “7” (loss function in element (3,1) of plot matrix) should have a minimum ca. at value 0.6 — if all other parameters are kept at (scaled value) 1.0. I try to find the exact minimum with Optim:

# Optimizing parameter 7, i.e., in plot matrix element (3,1)
loss = (par) -> cost(par,[7])
optimize(loss,[0.5],[1.5],[1.0],Fminbox())

The result is shown below:

Results of Optimization Algorithm
 * Algorithm: Fminbox with L-BFGS
 * Starting Point: [1.0]
 * Minimizer: [0.7577163938258104]
 * Minimum: 6.411025e-02
 * Iterations: 3
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: true 
     |x - x'| = 0.00e+00 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 1.18e+01 
   * Stopped by an increasing objective: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 500
 * Gradient Calls: 500

What puzzles me, is that Fminbox() suggests a minimum \approx 0.76, which is relatively far from the observed minimum around 0.6 observed in the plot.

What do I do wrong?

Using a local minimizer like L-BFGS usually isn’t a good strategy for this kind of problem. Give a global method like the ones from BlackBoxOptim.jl a try.

2 Likes

Thanks! Will test tomorrow!

Btw: I assume that “local minimizer” = gradient based method. I’ve seen before that computing cost function with variable step length ODE solvers often leads to jagged cost functions even when the “analytic” solution (if it exists) should produce a smooth cost function. Which may be a problem with gradient based methods.

Some people suggest Nelder-Meades based solvers – Optim has a Nelder-Meades solver (for dimension \ge 2), but I’m not sure if it support boxed constraints.

Nelder-Mead should work fine here I think, you can always restart it with random initial conditions and take the best solution to make it more robust. It’s not the most efficient method though, but if your error function is fast to compute, it’s not really an issue.

1 Like

I tried BlackBoxOptim. With a single variable, it is really slow – some 30 seconds to find the minimum that Optim Fminbox took some fractions of a second to – eh – not quite find. With multiple variables, it is really faster – which puzzles me…

In any way, bboptimize seems to do the job in the scalar case. Given that it actually is much faster in the multivariable case, I’m wondering whether it fails to find the optimium, or whether I need to specify more attempts. Also, the resulting data structure is somewhat overwhelming, although I found functions to extract the key information.

Since then, I’ve been told that if I replace the default L-BFGS solver in Fminbox with a GradientDescent solver, maybe that works better.

I should also test the Optim (or other) Nelder-Meads algorithms, but am not sure if Nelder-Meads of Julia supports boxed variables??

1 Like

… ok. seems like using Fminbox(GradientDescent()) may work but be much slower than the default solver of Fminbox() = Fminbox(LBFGS()). And that I can also use Fminbox(NelderMead()). I’ll check tomorrow (since it is late here) how GradientDescent() and NelderMead() works with Optim and Fminbox(), and how this compares to the global solvers in the BlackBoxOptim package.

A local optimization algorithm is one that is only guaranteed to find a local minimum, not a global minimum. It doesn’t necessarily mean that you supply a gradient (although it may approximately compute a gradient internally).

(Of course, there are hybrid approaches, e.g. multistart algorithms that repeatedly call efficient local-optimization algorithms from different starting points.)

2 Likes

Assuming your numerical experiments are correct, have you tried warm-starting the optimizer at the supposedly optimal solution (or an \epsilon away?). This way you can rule out errors in the problem formulation.

1 Like

What bothers me the most here is that we’re handling the fact that the objective couldn’t be decreased incorrectly in Fminbox, which leads to the wrong comment: “Convergence: true”. It did not converge which is why you get this weird result. Notice the number of objective/gradient. For three iterations of Fminbox, that’s on the high side, and I guess that the last attempt failed during the line search. Since you’re going about this on a scalar level, did you try

optimize(loss,0.5,1.5)

?
That will use Brent’s method for univariate optimization. You’ll have to give me cost for me to say more on this.

2 Likes

OK – I’ve tested some options. First, a somewhat nicer plot of the scalar variation in the model fit loss function for my 14 parameters (they have been scaled so that the initial guess would be 1.0):

Next, I consider scalar variation in parameter 7 = loss function in element (2,3) of the plot matrix (“UAs2Fe”).

Using a univariate method (Brent’s method), I get:

julia> loss = (par) -> cost(par,[7])
julia> optimize(loss,0.5,1.5)
Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [0.500000, 1.500000]
 * Minimizer: 5.955942e-01
 * Minimum: 3.981820e-02
 * Iterations: 32
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 33

If I use multivariable methods (Fminbox), with default solver I get:

julia> optimize(loss,[0.5],[1.5],[1.0],Fminbox())
Results of Optimization Algorithm
 * Algorithm: Fminbox with L-BFGS
 * Starting Point: [1.0]
 * Minimizer: [0.7577163938258104]
 * Minimum: 6.411025e-02
 * Iterations: 3
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: true 
     |x - x'| = 0.00e+00 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 1.18e+01 
   * Stopped by an increasing objective: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 500
 * Gradient Calls: 500

Here, the result is quite wrong…

Alternatively, I try to use GradientDescent, leading to failure:

julia> optimize(loss,[0.5],[1.5],[1.0],Fminbox(GradientDescent()))
┌ Warning: Linesearch failed, using alpha = 8.32126246806568e-12 and exiting optimization.
│ The linesearch exited with message:
│ Linesearch failed to converge, reached maximum iterations 50.
└ @ Optim C:\Users\user_name\.julia\packages\Optim\Agd3B\src\utilities\perform_linesearch.jl:47
┌ Warning: f(x) increased: stopping optimization
└ @ Optim C:\Users\user_name\.julia\packages\Optim\Agd3B\src\multivariate\solvers\constrained\fminbox.jl:293

Results of Optimization Algorithm
 * Algorithm: Fminbox with Gradient Descent
 * Starting Point: [1.0]
 * Minimizer: [0.5627850567743929]
 * Minimum: 4.219153e-02
 * Iterations: 2
 * Convergence: false
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 2.67e-11 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = 5.59e-11 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 3.21e+00 
   * Stopped by an increasing objective: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 305
 * Gradient Calls: 305

Even with a crash, the result is better than with the L-BFGS algorithm.

Finally, with the BlackBoxOptim default algorithm:

julia> res = bboptimize(loss; SearchRange = (0.5,1.5), NumDimensions = 1, TraceMode = :silent)
julia> best_candidate(res),best_fitness(res)
([0.596312], 0.03959384553988452)

which is rather similar to that of Brent’s scalar method.

The BlackBoxOptim routine is much slower than those in Optim.

If you are interested in looking into my cost/loss function, I can send the Jupyter notebook + the underlying CSV-file, as long as this is not made public (some unpublished work…).

Sure, I’ll send my e-mail through direct messaging.

I will say the following:

Your pictures show that the first three and the sixth (and the last) look kind of jagged. Are you sure they’re smooth? I will see when you send it, but I would be worried that it’s not as smooth as you think, and this will certainly cause issues.

You are right in that the functions probably are not super smooth. I’ve seen before that loss functions computed from solving ODEs with variable step length tend to have some jaggedness.

So it is possible that this jaggedness causes the local solvers to stop before they find an optimum.

I’m in the process of testing multivariable optimization finding 10 parameters simultaneously. The Gradient Descent, L-BFGS, and the default BlackBoxOptim solvers give quite different solutions :-(. Right now, I’m running the NelderMead algorithm; for some reason, this algorithm seems to be much, much slower than even the BlackBoxOptim solver. I’ll report when my laptop (16GB RAM – too little?) has delivered a solution…

Yes! Yes yes yes :slight_smile:

Edit: yes
Serious edit: all those “ups and downs” will mean that you have tons of local minima, and the only guarantee you have with these methods is that one of them will be found. Is it possible to solve it to a tighter tolerance? (more precise solution)

I don’t think it should be such an issue here, if you look at the axis on dimension 1&2 they are essentially flat, he just zoomed in like crazy.

RAM’s likely not an issue. It’s because it’s proceeding serially, one ODE solve at a time, thousands of times.

You might want to reduce the tolerance a bunch after you’re close to the minimum, and then get it to go a little more.

Here are some results with the different optimizers with 10 parameters:

julia> # Using Optim + L-BFGS
julia> loss = (par) -> cost(par,p_cand)
julia> @time res10_lbfgs = optimize(loss,fill(0.3,10),fill(2.0,10),fill(1.0,10),Fminbox());
 9.488374 seconds (120.21 M allocations: 4.825 GiB, 13.05% gc time)
julia> Optim.minimizer(res10_lbfgs)
10-element Array{Float64,1}:
 1.0892319595180837
 0.9224262335052348
 0.5883927466637747
 0.978128572405003 
 0.9315007544778905
 1.1011157712133484
 0.9069204188069746
 1.0536679659377284
 1.0970171891856608
 1.103414236344359

Next:

julia> # Using Optim + GradientDescent
julia> @time res10_gd = optimize(loss,fill(0.3,10),fill(2.0,10),fill(1.0,10),Fminbox(GradientDescent()));
7.616987 seconds (103.95 M allocations: 4.346 GiB, 13.84% gc time)
julia> Optim.minimizer(res10_gd)
10-element Array{Float64,1}:
 1.0297607064111505
 0.9717089026600397
 0.9701309719301778
 0.9876652956389227
 0.9747171511101008
 0.9720729744237891
 1.0240835871987013
 1.0256016667497516
 0.9734105609089325
 0.971583538296498

Next:

# Using Optim + NelderMead
julia> @time res10_nm = optimize(loss,fill(0.3,10),fill(2.0,10),fill(1.0,10),Fminbox(NelderMead()));
1476.697756 seconds (26.13 G allocations: 1.021 TiB, 15.27% gc time)
julia> Optim.minimizer(res10_nm)
10-element Array{Float64,1}:
 1.4433107238439118
 1.4595059863802011
 0.5872514501712995
 0.9845185140096595
 1.0806856155599496
 1.2896505935597573
 1.1001430661997713
 1.2599199344489431
 1.2835038180590241
 1.5390862317949026

Next:

julia> # Using BlackBoxOptim
julia> @time res10_bb = bboptimize(loss; SearchRange = (0.3,2.0), NumDimensions = 10, TraceMode = :silent);
26.376395 seconds (499.54 M allocations: 20.009 GiB, 16.04% gc time)
julia> best_candidate(res10_bb)
10-element Array{Float64,1}:
 1.7806061305131693
 1.850768850577829 
 0.5869654613812466
 1.1078252447426908
 1.8256324607370495
 0.7083605185266063
 1.4639428229868445
 1.4931612890482668
 0.9388103412516573
 1.5240243421627442

Finally, comparing the loss of model fit:

julia> Optim.minimum(res10_lbfgs), Optim.minimum(res10_gd), Optim.minimum(res10_nm), best_fitness(res10_bb)
(0.035504796897362405, 0.07465169329618884, 0.030234241700558218, 0.02926037639603645)

To me, the most remarkable observation here is the horrible performance/time consumption of the Nelder-Mead algorithm.

One strange thing is that the BlackBoxOptim algorithm for 10 variables in fact is considerable faster than the univariate case. WHY?

Finally, the resulting parameter values are quite different. The final cost function values for the gradient based solvers are quite similar, and markedly poorer than the results of the non-gradient based methods.

An interesting question here is the degree of “identifiability” of the parameters. This could probably be studied using some profile likelihood methods or related methods, or more theoretical methods – but the more theoretical methods are tricky…

Where’s the univariate minimizer in the very first panel of the first figure (in the first post in this thread)? Around 0.25, right? But start BFGS (or any of the methods in Optim) at 1.5 and they will generally not make it down below 1.0 (unless by a lucky initial direction). That was just my point.

That said, yes. Several of these seem flat (at least for the chosen values of the other parameters), and this is an indication that you are not able to identify the parameters of interest in this context/with this data.