Latest recommendations for global optimization

I should have mentioned this, but I wanted to do it myself, but couldn’t really get it to work in the amount of time I had available. Also, before looking at the actual code, I thought R-Optim.NelderMead in the plot was nelder mead from R’s optim :slight_smile:

You may be interested in this example of using simulated annealing to estimate a small DSGE model by GMM: https://github.com/mcreel/Econometrics/blob/master/Examples/DSGE/GMM/DoGMM.jl

Some discussion of this is found in the document econometrics.pdf, which is in the top level of the repo that contains this example. In my experience, Bayesian GMM-style methods are more reliable for this sort of model.

1 Like

If the function evaluation is expensive, you may also be interested in GitHub - jbrea/BayesOpt.jl

4 Likes

I’ve added SAMIN and made a couple of changes, now the best algorithm seems to be some of the BlackBoxOptim.jl ones. They don’t have very good final convergence, so I chained them into Nelder-Mead, which improved their performance in the benchmark.

I should run the benchmark with more repeats and at higher dimension, but the Python optimisers are so slow that it already take quite a long time to complete. Maybe there’s still some issues with options too.

2 Likes

Thanks for looking at this again! Do you run this on v1.0.x?

Edit: Just had a look, yeah this seems much more in line with what I’d expect. SAMIN in action @mcreel ! Does quite well considering the winners are designed to handle very tough landscapes.

Depending on how irregular the test functions are, SAMIN’s performance could possibly be improved by setting the temperature reduction factor (rt), which is 0.9, by default, to a lower value, e.g., 0.5. This speeds it up, but increases the chances of missing the global optimum. The 0.9 value is pretty conservative, designed to deal with challenging functions.

1 Like

Some are quite tough I’d say, although you cannot really put a number on toughness: http://coco.lri.fr/downloads/download15.01/bbobdocfunctions.pdf

I agree that toughness is hard to quantify without experimentation. When I use SAMIN, I run it several times to verify that I always get the same final result. If I don’t, I set rt closer to 1, and start again. But doing that in a benchmarking framework would not be fun. I just wanted to point out that the default value is not necessarily a good choice for these test functions.

1 Like

I ran in on v0.7, I’ve updated the benchmark script but there’s maybe some issues remaining on v1.0.

The BBOB functions are designed to test different aspects of the optimisers (it’s well explained in the docs), one would check which functions SAMIN fails to solve and try to understand why. I’ve added an example script here :

                                            Sphere => 1.0 
                              Ellipsoidal Function => 1.0 
                                   Discus Function => 0.0 
                               Bent Cigar Function => 0.15
                              Sharp Ridge Function => 0.0 
                         Different Powers Function => 0.65
                                Rastrigin Function => 0.1 
                              Weierstrass Function => 0.95
                             Schaffers F7 Function => 1.0 
 Schaffers F7 Function, moderately ill-conditioned => 0.25
       Composite Griewank-Rosenbrock Function F8F2 => 0.05
                                       Ellipsoidal => 1.0 
                                 Schwefel Function => 0.7 
                                         Rastrigin => 0.5 
                                   Buche-Rastrigin => 0.6 
                                      Linear Slope => 1.0 
                                 Attractive Sector => 1.0 
                         Step Ellipsoidal Function => 1.0 
                     Rosenbrock Function, original => 1.0 
                      Rosenbrock Function, rotated => 1.0 
1 Like

Thanks for the script, it’s interesting to play with. Below is a version that tries to get the right answer, even if it takes some time. The run length limit is set much higher, and SA is tuned to be more conservative. The results using Ntrials=10 are

                                            Sphere => 1.0
                              Ellipsoidal Function => 1.0
                                   Discus Function => 1.0
                               Bent Cigar Function => 1.0
                              Sharp Ridge Function => 0.0
                         Different Powers Function => 1.0
                                Rastrigin Function => 1.0
                              Weierstrass Function => 0.9
                             Schaffers F7 Function => 1.0
 Schaffers F7 Function, moderately ill-conditioned => 1.0
       Composite Griewank-Rosenbrock Function F8F2 => 0.9
                                       Ellipsoidal => 1.0
                                 Schwefel Function => 1.0
                                         Rastrigin => 1.0
                                   Buche-Rastrigin => 1.0
                                      Linear Slope => 1.0
                                 Attractive Sector => 1.0
                         Step Ellipsoidal Function => 1.0
                     Rosenbrock Function, original => 1.0
                      Rosenbrock Function, rotated => 1.0

The Sharp Ridge function seems pretty tough! Is it possible that there’s an error in the coded true values? The thing I’m happy to see is that, in most cases, it is possible to get the solution if you’re willing to wait long enough.

The script:

using BlackBoxOptimizationBenchmarking, Optim
import BlackBoxOptimizationBenchmarking: minimizer, minimum, optimize 
import Base.string

const BBOB = BlackBoxOptimizationBenchmarking

box(D) = fill((-5.5, 5.5),D)
pinit(D) = 10*rand(D).-5

optimize(opt::Optim.SAMIN,f,D,run_length) =
    Optim.optimize(f, fill(-5.5,D), fill(5.5,D), pinit(D), opt, Optim.Options(f_calls_limit=run_length,g_tol=1e-120,iterations=run_length))

run_length = 10000000
dimensions = 3
Ntrials = 10
Δf = 1e-6

#=
# test on single function
f = BlackBoxOptimizationBenchmarking.F12 # 11 is discus, 12 bent cigar
success_rate, x_dist, f_dist, runtime = BBOB.benchmark(Optim.SAMIN(rt=0.95, nt=20, ns=20, f_tol=1e-12, x_tol=1e-6, verbosity=2), f, run_length, Ntrials, dimensions, Δf)
println(success_rate)
=#

#test on all functions
perf = [(f=>BBOB.benchmark(Optim.SAMIN(rt=0.95, nt=30, ns=30, f_tol=1e-12, x_tol=1e-6, verbosity=0), f, run_length, Ntrials, dimensions, Δf)[1][1]) for f in enumerate(BBOBFunction)] 
    
2 Likes

I’m testing that the minimum is correct when loading the package so it should be fine. You can always check with:

    BBOB.F13(BBOB.F13.x_opt[1:3]) == BBOB.F13.f_opt

I think it’s a final convergence issue, if you look at the parameters that SAMIN returns it’s pretty close to the minimum but not quite there. That’s why I chain it (and other optimizers) with NelderMead in the benchmark.

Yes, with even more conservative tuning, SA also converges for that function, but it’s quite a difficult case, I’ve never encountered anything like that in my “real life” applications.

Apologies for resurrecting this ancient thread, but reading a new related paper today reminded me of this discussion and I thought people might be interested:

Benchmarking Global Optimizers - Antoine Arnoud, Fatih Guvenen, Tatjana Kleineberg

2 Likes

I implemented the TikTak multistart algorithm from that paper in Julia:

The difference between the results that Arnoud, Guvenen, and Kleineberg describe and my implementation is that they use Nelder-Mead and DFNLS as the local solver, while I allow all the derivative-free optimizers in NLopt.jl as local solvers. These of course do not include DFNLS, but @stevengj’s LN_NEWUOA_BOUND is very competitive.

For some preliminary benchmarks, the following table has the number of function evaluations (the lower the better) for

  • TikTak with 100 Sobol initial points,
  • dimension 10,
  • local search terminating with absolute tolerance 1e-8 in the position
ShiftedQuadratic Griewank LevyMontalvo2 Rastrigin Rosenbrock
LN_BOBYQA 569 2633 4235 FAIL 10995
LN_NELDERMEAD 15750 17108 33088 FAIL 42785
LN_NEWUOA_BOUND 580 2088 2253 FAIL 13409
LN_SBPLX 12329 11806 11447 FAIL 7020038
LN_COBYLA 16943 37414 32792 FAIL 985676
LN_PRAXIS 1850 9886 8548 FAIL 15436

I wanted to try this for some indirect inference economics problems I am working on, benchmarks on these will follow.

18 Likes

Here’s an example of running IntervalOptimisation.jl on the Rastrigin function in 10 dimensions (the second argument to IntervalBox).

The result should be a guarantee that the global minimum value and minimizers inside the box X are inside the resulting interval boxes (Cartesian product of intervals).

julia> using IntervalArithmetic, IntervalOptimisation

julia> rastrigin(x) = 10 * length(x) + sum(abs2.(x) .- 10 .* cos.(2*Interval(pi) .* x));

julia> X = IntervalBox(-6..6, 10)
[-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6]

julia> @time minimum, minimisers = minimise(rastrigin, X, tol=1e-6);
  0.016861 seconds (314.62 k allocations: 13.250 MiB)

julia> minimum
[0, 0]

julia> minimisers
1-element Array{IntervalBox{10,Float64},1}:
 [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07]

(The objects in [...] are Intervals. I think we should change the printing to use .. instead.)

5 Likes

I find interval methods very elegant, and I wish I could use them. But unfortunately, in indirect inference the objective is usually simulated, with many branches and local discontinuities (if you look at the appendix of the paper linked above, you will see examples plots of this). AFAIK interval methods do not deal with these well: eg

using IntervalArithmetic, IntervalOptimisation
B = IntervalBox(-6..6, 10)
function f0(x)
    if x > 6
        x + 2
    elseif x < 3
        x + 1
    elseif x < 1
        x + 0.5
    else
        x
    end
end
f(x) = sum(f0 ∘ abs, x)
minimise(f, B, tol=1e-6)

keeps running forever for me (I killed it after 10 minutes), while

using MultistartOptimization, NLopt
multistart_minimization(TikTak(100), NLoptLocalMethod(NLopt.LN_NEWUOA_BOUND),
                        MinimizationProblem(f, fill(-6, 10), fill(6, 10)))

finds a optimum in seconds.

Just to put the whole exercise in context: the functions that people came up with to torture (test) various global optimization algorithms are frequently stand-ins for real-life functions that exhibit some similar property (eg a bunch of local extrema like Rastrigin, misleading steep valleys like Rosenbrock, etc), while at the same time lacking some other important properties (lack of differentiability and sometimes continuity) because of practical reasons.

2 Likes

I like to check how simulated annealing does with the problems that come up, just to satisfy my curiosity.

Here’s SA applied to Tamas_Papp’s example, using a random start value between -10 and 10. It converges to 0.0 every time I try it. Just an illustration that it works with discontinuous and nondifferentiable objectives:

julia> @time samin(f,[20.0*rand()-10.0], [-10.0], [10.0])
____________________________________________________________
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 975

     Obj. value:      1.0000000000

       parameter      search width
        -0.00000           0.00000 
____________________________________________________________
  0.002172 seconds (3.05 k allocations: 223.984 KiB)

And here it is applied to dpsander’s Rastrigin function. Getting it to work requires very slow cooling, and it takes longer than the interval optimization method.

julia> @time samin(rastrigin,12.0*rand(10).-6.0,-6.0*ones(10), 6.0*ones(10), nt=5, ns = 3, rt=0.999, maxevals=1e8)
____________________________________________________________
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 3781950

     Obj. value:      0.0000000000

       parameter      search width
        -0.00000           0.00000 
        -0.00000           0.00000 
         0.00000           0.00001 
        -0.00000           0.00000 
         0.00000           0.00000 
         0.00000           0.00001 
        -0.00000           0.00000 
         0.00000           0.00000 
         0.00000           0.00000 
        -0.00000           0.00001 
____________________________________________________________
  1.642120 seconds (9.75 M allocations: 1.455 GiB, 12.27% gc tim
1 Like

Thanks for exploring this. I wish I had more intuition about choosing a cooling schedule for SA. In practice, what I am always told is that if I fail to get to an optimum (which I know ex ante, in a testing context), then I was cooling too fast.

I think one advantage of using something a low-discrepancy sequence (in practice, Sobol) for exploration is a more or less uniform coverage. I wonder if it would make sense to combine SA with multistart methods.

I think another reason why SA is popular is that it’s based on solid theoretical grounds: the classic Metropolis-Hastings algorithm which was used back in the day to figure out the optimal spin configuration in a lsing model.

The issue is not so much the cooling schedule per se but rather the expensive objective calls. If objective calls are very expensive, one makes a compromise by setting a lower cooling schedule to “zoom-in” on the optimizer quickly.

That being said, there are newer versions of the basic SA algorithm like Adaptive Simulated Annealing which may also be useful to try in the above benchmarks.

I thought that interval methods have, in principle, no difficulty (?) with discontinuities per se.
They do have a problem with control flow, although we have been discussing how to mitigate that with Julia.

But then I tried a version of your function and it indeed doesn’t seem to work so well:


H(x::Real) = x <= 0 ? zero(x) : one(x)

function H(x::Interval{T}) where {T}
    x.hi <= 0 && return zero(x)
    x.lo > 0 && return one(x)
    return Interval{T}(0, 1)
end
        
function f1(x)
    return H(x - 6) * (x + 2) +
            (1 - H(x - 6)) * (
                (1 - H(x - 3)) * (x + 1)
                + (x - 3) * x 
                )
end 
    
f1(3)

f(x) = sum(f1 ∘ abs, x)
B = IntervalBox(-1..1, 2)

minimise(f, B, tol=0.01)

gives a solution in 1 second (in 2 dimensions, with a very small box).

But in general there just seem to be too many pieces that it is unable to discard.

1 Like