Latest recommendations for global optimization

pkofod · November 20, 2018, 7:56am

I should have mentioned this, but I wanted to do it myself, but couldn’t really get it to work in the amount of time I had available. Also, before looking at the actual code, I thought R-Optim.NelderMead in the plot was nelder mead from R’s optim

mcreel · November 20, 2018, 8:50am

You may be interested in this example of using simulated annealing to estimate a small DSGE model by GMM: https://github.com/mcreel/Econometrics/blob/master/Examples/DSGE/GMM/DoGMM.jl

Some discussion of this is found in the document econometrics.pdf, which is in the top level of the repo that contains this example. In my experience, Bayesian GMM-style methods are more reliable for this sort of model.

jbrea · November 20, 2018, 9:07am

If the function evaluation is expensive, you may also be interested in GitHub - jbrea/BayesOpt.jl

jonathanBieler · November 22, 2018, 11:19am

I’ve added SAMIN and made a couple of changes, now the best algorithm seems to be some of the BlackBoxOptim.jl ones. They don’t have very good final convergence, so I chained them into Nelder-Mead, which improved their performance in the benchmark.

I should run the benchmark with more repeats and at higher dimension, but the Python optimisers are so slow that it already take quite a long time to complete. Maybe there’s still some issues with options too.

pkofod · November 22, 2018, 11:34am

Thanks for looking at this again! Do you run this on v1.0.x?

Edit: Just had a look, yeah this seems much more in line with what I’d expect. SAMIN in action @mcreel ! Does quite well considering the winners are designed to handle very tough landscapes.

mcreel · November 22, 2018, 12:50pm

Depending on how irregular the test functions are, SAMIN’s performance could possibly be improved by setting the temperature reduction factor (rt), which is 0.9, by default, to a lower value, e.g., 0.5. This speeds it up, but increases the chances of missing the global optimum. The 0.9 value is pretty conservative, designed to deal with challenging functions.

pkofod · November 22, 2018, 1:02pm

Some are quite tough I’d say, although you cannot really put a number on toughness: http://coco.lri.fr/downloads/download15.01/bbobdocfunctions.pdf

mcreel · November 22, 2018, 1:08pm

I agree that toughness is hard to quantify without experimentation. When I use SAMIN, I run it several times to verify that I always get the same final result. If I don’t, I set rt closer to 1, and start again. But doing that in a benchmarking framework would not be fun. I just wanted to point out that the default value is not necessarily a good choice for these test functions.

jonathanBieler · November 22, 2018, 1:43pm

I ran in on v0.7, I’ve updated the benchmark script but there’s maybe some issues remaining on v1.0.

The BBOB functions are designed to test different aspects of the optimisers (it’s well explained in the docs), one would check which functions SAMIN fails to solve and try to understand why. I’ve added an example script here :

                                            Sphere => 1.0 
                              Ellipsoidal Function => 1.0 
                                   Discus Function => 0.0 
                               Bent Cigar Function => 0.15
                              Sharp Ridge Function => 0.0 
                         Different Powers Function => 0.65
                                Rastrigin Function => 0.1 
                              Weierstrass Function => 0.95
                             Schaffers F7 Function => 1.0 
 Schaffers F7 Function, moderately ill-conditioned => 0.25
       Composite Griewank-Rosenbrock Function F8F2 => 0.05
                                       Ellipsoidal => 1.0 
                                 Schwefel Function => 0.7 
                                         Rastrigin => 0.5 
                                   Buche-Rastrigin => 0.6 
                                      Linear Slope => 1.0 
                                 Attractive Sector => 1.0 
                         Step Ellipsoidal Function => 1.0 
                     Rosenbrock Function, original => 1.0 
                      Rosenbrock Function, rotated => 1.0

mcreel · November 23, 2018, 3:27pm

Thanks for the script, it’s interesting to play with. Below is a version that tries to get the right answer, even if it takes some time. The run length limit is set much higher, and SA is tuned to be more conservative. The results using Ntrials=10 are

                                            Sphere => 1.0
                              Ellipsoidal Function => 1.0
                                   Discus Function => 1.0
                               Bent Cigar Function => 1.0
                              Sharp Ridge Function => 0.0
                         Different Powers Function => 1.0
                                Rastrigin Function => 1.0
                              Weierstrass Function => 0.9
                             Schaffers F7 Function => 1.0
 Schaffers F7 Function, moderately ill-conditioned => 1.0
       Composite Griewank-Rosenbrock Function F8F2 => 0.9
                                       Ellipsoidal => 1.0
                                 Schwefel Function => 1.0
                                         Rastrigin => 1.0
                                   Buche-Rastrigin => 1.0
                                      Linear Slope => 1.0
                                 Attractive Sector => 1.0
                         Step Ellipsoidal Function => 1.0
                     Rosenbrock Function, original => 1.0
                      Rosenbrock Function, rotated => 1.0

The Sharp Ridge function seems pretty tough! Is it possible that there’s an error in the coded true values? The thing I’m happy to see is that, in most cases, it is possible to get the solution if you’re willing to wait long enough.

The script:

using BlackBoxOptimizationBenchmarking, Optim
import BlackBoxOptimizationBenchmarking: minimizer, minimum, optimize 
import Base.string

const BBOB = BlackBoxOptimizationBenchmarking

box(D) = fill((-5.5, 5.5),D)
pinit(D) = 10*rand(D).-5

optimize(opt::Optim.SAMIN,f,D,run_length) =
    Optim.optimize(f, fill(-5.5,D), fill(5.5,D), pinit(D), opt, Optim.Options(f_calls_limit=run_length,g_tol=1e-120,iterations=run_length))

run_length = 10000000
dimensions = 3
Ntrials = 10
Δf = 1e-6

#=
# test on single function
f = BlackBoxOptimizationBenchmarking.F12 # 11 is discus, 12 bent cigar
success_rate, x_dist, f_dist, runtime = BBOB.benchmark(Optim.SAMIN(rt=0.95, nt=20, ns=20, f_tol=1e-12, x_tol=1e-6, verbosity=2), f, run_length, Ntrials, dimensions, Δf)
println(success_rate)
=#

#test on all functions
perf = [(f=>BBOB.benchmark(Optim.SAMIN(rt=0.95, nt=30, ns=30, f_tol=1e-12, x_tol=1e-6, verbosity=0), f, run_length, Ntrials, dimensions, Δf)[1][1]) for f in enumerate(BBOBFunction)]

jonathanBieler · November 23, 2018, 4:19pm

I’m testing that the minimum is correct when loading the package so it should be fine. You can always check with:

    BBOB.F13(BBOB.F13.x_opt[1:3]) == BBOB.F13.f_opt

I think it’s a final convergence issue, if you look at the parameters that SAMIN returns it’s pretty close to the minimum but not quite there. That’s why I chain it (and other optimizers) with NelderMead in the benchmark.

mcreel · November 23, 2018, 5:13pm

Yes, with even more conservative tuning, SA also converges for that function, but it’s quite a difficult case, I’ve never encountered anything like that in my “real life” applications.

nilshg · October 7, 2019, 2:56pm

Apologies for resurrecting this ancient thread, but reading a new related paper today reminded me of this discussion and I thought people might be interested:

Benchmarking Global Optimizers - Antoine Arnoud, Fatih Guvenen, Tatjana Kleineberg

Tamas_Papp · December 7, 2019, 7:52am

I implemented the TikTak multistart algorithm from that paper in Julia:

The difference between the results that Arnoud, Guvenen, and Kleineberg describe and my implementation is that they use Nelder-Mead and DFNLS as the local solver, while I allow all the derivative-free optimizers in NLopt.jl as local solvers. These of course do not include DFNLS, but @stevengj’s LN_NEWUOA_BOUND is very competitive.

For some preliminary benchmarks, the following table has the number of function evaluations (the lower the better) for

TikTak with 100 Sobol initial points,
dimension 10,
local search terminating with absolute tolerance 1e-8 in the position

	ShiftedQuadratic	Griewank	LevyMontalvo2	Rastrigin	Rosenbrock
`LN_BOBYQA`	569	2633	4235	FAIL	10995
`LN_NELDERMEAD`	15750	17108	33088	FAIL	42785
`LN_NEWUOA_BOUND`	580	2088	2253	FAIL	13409
`LN_SBPLX`	12329	11806	11447	FAIL	7020038
`LN_COBYLA`	16943	37414	32792	FAIL	985676
`LN_PRAXIS`	1850	9886	8548	FAIL	15436

I wanted to try this for some indirect inference economics problems I am working on, benchmarks on these will follow.

dpsanders · December 7, 2019, 4:02pm

Here’s an example of running IntervalOptimisation.jl on the Rastrigin function in 10 dimensions (the second argument to IntervalBox).

The result should be a guarantee that the global minimum value and minimizers inside the box X are inside the resulting interval boxes (Cartesian product of intervals).

julia> using IntervalArithmetic, IntervalOptimisation

julia> rastrigin(x) = 10 * length(x) + sum(abs2.(x) .- 10 .* cos.(2*Interval(pi) .* x));

julia> X = IntervalBox(-6..6, 10)
[-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6]

julia> @time minimum, minimisers = minimise(rastrigin, X, tol=1e-6);
  0.016861 seconds (314.62 k allocations: 13.250 MiB)

julia> minimum
[0, 0]

julia> minimisers
1-element Array{IntervalBox{10,Float64},1}:
 [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07]

(The objects in [...] are Intervals. I think we should change the printing to use .. instead.)

Tamas_Papp · December 8, 2019, 8:53am

I find interval methods very elegant, and I wish I could use them. But unfortunately, in indirect inference the objective is usually simulated, with many branches and local discontinuities (if you look at the appendix of the paper linked above, you will see examples plots of this). AFAIK interval methods do not deal with these well: eg

using IntervalArithmetic, IntervalOptimisation
B = IntervalBox(-6..6, 10)
function f0(x)
    if x > 6
        x + 2
    elseif x < 3
        x + 1
    elseif x < 1
        x + 0.5
    else
        x
    end
end
f(x) = sum(f0 ∘ abs, x)
minimise(f, B, tol=1e-6)

keeps running forever for me (I killed it after 10 minutes), while

using MultistartOptimization, NLopt
multistart_minimization(TikTak(100), NLoptLocalMethod(NLopt.LN_NEWUOA_BOUND),
                        MinimizationProblem(f, fill(-6, 10), fill(6, 10)))

finds a optimum in seconds.

Just to put the whole exercise in context: the functions that people came up with to torture (test) various global optimization algorithms are frequently stand-ins for real-life functions that exhibit some similar property (eg a bunch of local extrema like Rastrigin, misleading steep valleys like Rosenbrock, etc), while at the same time lacking some other important properties (lack of differentiability and sometimes continuity) because of practical reasons.

mcreel · December 8, 2019, 11:53am

I like to check how simulated annealing does with the problems that come up, just to satisfy my curiosity.

Here’s SA applied to Tamas_Papp’s example, using a random start value between -10 and 10. It converges to 0.0 every time I try it. Just an illustration that it works with discontinuous and nondifferentiable objectives:

julia> @time samin(f,[20.0*rand()-10.0], [-10.0], [10.0])
____________________________________________________________
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 975

     Obj. value:      1.0000000000

       parameter      search width
        -0.00000           0.00000 
____________________________________________________________
  0.002172 seconds (3.05 k allocations: 223.984 KiB)

And here it is applied to dpsander’s Rastrigin function. Getting it to work requires very slow cooling, and it takes longer than the interval optimization method.

julia> @time samin(rastrigin,12.0*rand(10).-6.0,-6.0*ones(10), 6.0*ones(10), nt=5, ns = 3, rt=0.999, maxevals=1e8)
____________________________________________________________
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 3781950

     Obj. value:      0.0000000000

       parameter      search width
        -0.00000           0.00000 
        -0.00000           0.00000 
         0.00000           0.00001 
        -0.00000           0.00000 
         0.00000           0.00000 
         0.00000           0.00001 
        -0.00000           0.00000 
         0.00000           0.00000 
         0.00000           0.00000 
        -0.00000           0.00001 
____________________________________________________________
  1.642120 seconds (9.75 M allocations: 1.455 GiB, 12.27% gc tim

Tamas_Papp · December 8, 2019, 12:03pm

Thanks for exploring this. I wish I had more intuition about choosing a cooling schedule for SA. In practice, what I am always told is that if I fail to get to an optimum (which I know ex ante, in a testing context), then I was cooling too fast.

I think one advantage of using something a low-discrepancy sequence (in practice, Sobol) for exploration is a more or less uniform coverage. I wonder if it would make sense to combine SA with multistart methods.

bashonubuntu · December 8, 2019, 2:45pm

I think another reason why SA is popular is that it’s based on solid theoretical grounds: the classic Metropolis-Hastings algorithm which was used back in the day to figure out the optimal spin configuration in a lsing model.

The issue is not so much the cooling schedule per se but rather the expensive objective calls. If objective calls are very expensive, one makes a compromise by setting a lower cooling schedule to “zoom-in” on the optimizer quickly.

That being said, there are newer versions of the basic SA algorithm like Adaptive Simulated Annealing which may also be useful to try in the above benchmarks.

dpsanders · December 8, 2019, 3:02pm

I thought that interval methods have, in principle, no difficulty (?) with discontinuities per se.
They do have a problem with control flow, although we have been discussing how to mitigate that with Julia.

But then I tried a version of your function and it indeed doesn’t seem to work so well:


H(x::Real) = x <= 0 ? zero(x) : one(x)

function H(x::Interval{T}) where {T}
    x.hi <= 0 && return zero(x)
    x.lo > 0 && return one(x)
    return Interval{T}(0, 1)
end
        
function f1(x)
    return H(x - 6) * (x + 2) +
            (1 - H(x - 6)) * (
                (1 - H(x - 3)) * (x + 1)
                + (x - 3) * x 
                )
end 
    
f1(3)

f(x) = sum(f1 ∘ abs, x)
B = IntervalBox(-1..1, 2)

minimise(f, B, tol=0.01)

gives a solution in 1 second (in 2 dimensions, with a very small box).

But in general there just seem to be too many pieces that it is unable to discard.

Topic		Replies	Views
Trying variety of Solvers Optimization (Mathematical)	2	715	July 24, 2020
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3141	August 22, 2017
Which optimization API/Pkg for simple global optimization Optimization (Mathematical)	8	1871	January 29, 2017
Comparison of derivative-free / black-box optimization algorithms in Julia Optimization (Mathematical) question , comparison	12	2654	February 14, 2024
Global optimization: Simulated Method of Moments Optimization (Mathematical) question	9	4009	July 26, 2019

Latest recommendations for global optimization

Related topics