I should have mentioned this, but I wanted to do it myself, but couldn’t really get it to work in the amount of time I had available. Also, before looking at the actual code, I thought R-Optim.NelderMead
in the plot was nelder mead from R’s optim
You may be interested in this example of using simulated annealing to estimate a small DSGE model by GMM: https://github.com/mcreel/Econometrics/blob/master/Examples/DSGE/GMM/DoGMM.jl
Some discussion of this is found in the document econometrics.pdf, which is in the top level of the repo that contains this example. In my experience, Bayesian GMM-style methods are more reliable for this sort of model.
If the function evaluation is expensive, you may also be interested in GitHub - jbrea/BayesOpt.jl
I’ve added SAMIN and made a couple of changes, now the best algorithm seems to be some of the BlackBoxOptim.jl
ones. They don’t have very good final convergence, so I chained them into Nelder-Mead, which improved their performance in the benchmark.
I should run the benchmark with more repeats and at higher dimension, but the Python optimisers are so slow that it already take quite a long time to complete. Maybe there’s still some issues with options too.
Thanks for looking at this again! Do you run this on v1.0.x?
Edit: Just had a look, yeah this seems much more in line with what I’d expect. SAMIN in action @mcreel ! Does quite well considering the winners are designed to handle very tough landscapes.
Depending on how irregular the test functions are, SAMIN’s performance could possibly be improved by setting the temperature reduction factor (rt), which is 0.9, by default, to a lower value, e.g., 0.5. This speeds it up, but increases the chances of missing the global optimum. The 0.9 value is pretty conservative, designed to deal with challenging functions.
Some are quite tough I’d say, although you cannot really put a number on toughness: http://coco.lri.fr/downloads/download15.01/bbobdocfunctions.pdf
I agree that toughness is hard to quantify without experimentation. When I use SAMIN, I run it several times to verify that I always get the same final result. If I don’t, I set rt closer to 1, and start again. But doing that in a benchmarking framework would not be fun. I just wanted to point out that the default value is not necessarily a good choice for these test functions.
I ran in on v0.7, I’ve updated the benchmark script but there’s maybe some issues remaining on v1.0.
The BBOB functions are designed to test different aspects of the optimisers (it’s well explained in the docs), one would check which functions SAMIN fails to solve and try to understand why. I’ve added an example script here :
Sphere => 1.0
Ellipsoidal Function => 1.0
Discus Function => 0.0
Bent Cigar Function => 0.15
Sharp Ridge Function => 0.0
Different Powers Function => 0.65
Rastrigin Function => 0.1
Weierstrass Function => 0.95
Schaffers F7 Function => 1.0
Schaffers F7 Function, moderately ill-conditioned => 0.25
Composite Griewank-Rosenbrock Function F8F2 => 0.05
Ellipsoidal => 1.0
Schwefel Function => 0.7
Rastrigin => 0.5
Buche-Rastrigin => 0.6
Linear Slope => 1.0
Attractive Sector => 1.0
Step Ellipsoidal Function => 1.0
Rosenbrock Function, original => 1.0
Rosenbrock Function, rotated => 1.0
Thanks for the script, it’s interesting to play with. Below is a version that tries to get the right answer, even if it takes some time. The run length limit is set much higher, and SA is tuned to be more conservative. The results using Ntrials=10 are
Sphere => 1.0
Ellipsoidal Function => 1.0
Discus Function => 1.0
Bent Cigar Function => 1.0
Sharp Ridge Function => 0.0
Different Powers Function => 1.0
Rastrigin Function => 1.0
Weierstrass Function => 0.9
Schaffers F7 Function => 1.0
Schaffers F7 Function, moderately ill-conditioned => 1.0
Composite Griewank-Rosenbrock Function F8F2 => 0.9
Ellipsoidal => 1.0
Schwefel Function => 1.0
Rastrigin => 1.0
Buche-Rastrigin => 1.0
Linear Slope => 1.0
Attractive Sector => 1.0
Step Ellipsoidal Function => 1.0
Rosenbrock Function, original => 1.0
Rosenbrock Function, rotated => 1.0
The Sharp Ridge function seems pretty tough! Is it possible that there’s an error in the coded true values? The thing I’m happy to see is that, in most cases, it is possible to get the solution if you’re willing to wait long enough.
The script:
using BlackBoxOptimizationBenchmarking, Optim
import BlackBoxOptimizationBenchmarking: minimizer, minimum, optimize
import Base.string
const BBOB = BlackBoxOptimizationBenchmarking
box(D) = fill((-5.5, 5.5),D)
pinit(D) = 10*rand(D).-5
optimize(opt::Optim.SAMIN,f,D,run_length) =
Optim.optimize(f, fill(-5.5,D), fill(5.5,D), pinit(D), opt, Optim.Options(f_calls_limit=run_length,g_tol=1e-120,iterations=run_length))
run_length = 10000000
dimensions = 3
Ntrials = 10
Δf = 1e-6
#=
# test on single function
f = BlackBoxOptimizationBenchmarking.F12 # 11 is discus, 12 bent cigar
success_rate, x_dist, f_dist, runtime = BBOB.benchmark(Optim.SAMIN(rt=0.95, nt=20, ns=20, f_tol=1e-12, x_tol=1e-6, verbosity=2), f, run_length, Ntrials, dimensions, Δf)
println(success_rate)
=#
#test on all functions
perf = [(f=>BBOB.benchmark(Optim.SAMIN(rt=0.95, nt=30, ns=30, f_tol=1e-12, x_tol=1e-6, verbosity=0), f, run_length, Ntrials, dimensions, Δf)[1][1]) for f in enumerate(BBOBFunction)]
I’m testing that the minimum is correct when loading the package so it should be fine. You can always check with:
BBOB.F13(BBOB.F13.x_opt[1:3]) == BBOB.F13.f_opt
I think it’s a final convergence issue, if you look at the parameters that SAMIN returns it’s pretty close to the minimum but not quite there. That’s why I chain it (and other optimizers) with NelderMead in the benchmark.
Yes, with even more conservative tuning, SA also converges for that function, but it’s quite a difficult case, I’ve never encountered anything like that in my “real life” applications.
Apologies for resurrecting this ancient thread, but reading a new related paper today reminded me of this discussion and I thought people might be interested:
Benchmarking Global Optimizers - Antoine Arnoud, Fatih Guvenen, Tatjana Kleineberg
I implemented the TikTak multistart algorithm from that paper in Julia:
The difference between the results that Arnoud, Guvenen, and Kleineberg describe and my implementation is that they use Nelder-Mead and DFNLS as the local solver, while I allow all the derivative-free optimizers in NLopt.jl as local solvers. These of course do not include DFNLS, but @stevengj’s LN_NEWUOA_BOUND
is very competitive.
For some preliminary benchmarks, the following table has the number of function evaluations (the lower the better) for
TikTak
with 100 Sobol initial points,- dimension
10
, - local search terminating with absolute tolerance
1e-8
in the position
ShiftedQuadratic | Griewank | LevyMontalvo2 | Rastrigin | Rosenbrock | |
---|---|---|---|---|---|
LN_BOBYQA |
569 | 2633 | 4235 | FAIL | 10995 |
LN_NELDERMEAD |
15750 | 17108 | 33088 | FAIL | 42785 |
LN_NEWUOA_BOUND |
580 | 2088 | 2253 | FAIL | 13409 |
LN_SBPLX |
12329 | 11806 | 11447 | FAIL | 7020038 |
LN_COBYLA |
16943 | 37414 | 32792 | FAIL | 985676 |
LN_PRAXIS |
1850 | 9886 | 8548 | FAIL | 15436 |
I wanted to try this for some indirect inference economics problems I am working on, benchmarks on these will follow.
Here’s an example of running IntervalOptimisation.jl
on the Rastrigin function in 10 dimensions (the second argument to IntervalBox
).
The result should be a guarantee that the global minimum value and minimizers inside the box X
are inside the resulting interval boxes (Cartesian product of intervals).
julia> using IntervalArithmetic, IntervalOptimisation
julia> rastrigin(x) = 10 * length(x) + sum(abs2.(x) .- 10 .* cos.(2*Interval(pi) .* x));
julia> X = IntervalBox(-6..6, 10)
[-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6] × [-6, 6]
julia> @time minimum, minimisers = minimise(rastrigin, X, tol=1e-6);
0.016861 seconds (314.62 k allocations: 13.250 MiB)
julia> minimum
[0, 0]
julia> minimisers
1-element Array{IntervalBox{10,Float64},1}:
[-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07] × [-3.19333e-07, 2.62642e-07]
(The objects in [...]
are Interval
s. I think we should change the printing to use ..
instead.)
I find interval methods very elegant, and I wish I could use them. But unfortunately, in indirect inference the objective is usually simulated, with many branches and local discontinuities (if you look at the appendix of the paper linked above, you will see examples plots of this). AFAIK interval methods do not deal with these well: eg
using IntervalArithmetic, IntervalOptimisation
B = IntervalBox(-6..6, 10)
function f0(x)
if x > 6
x + 2
elseif x < 3
x + 1
elseif x < 1
x + 0.5
else
x
end
end
f(x) = sum(f0 ∘ abs, x)
minimise(f, B, tol=1e-6)
keeps running forever for me (I killed it after 10 minutes), while
using MultistartOptimization, NLopt
multistart_minimization(TikTak(100), NLoptLocalMethod(NLopt.LN_NEWUOA_BOUND),
MinimizationProblem(f, fill(-6, 10), fill(6, 10)))
finds a optimum in seconds.
Just to put the whole exercise in context: the functions that people came up with to torture (test) various global optimization algorithms are frequently stand-ins for real-life functions that exhibit some similar property (eg a bunch of local extrema like Rastrigin, misleading steep valleys like Rosenbrock, etc), while at the same time lacking some other important properties (lack of differentiability and sometimes continuity) because of practical reasons.
I like to check how simulated annealing does with the problems that come up, just to satisfy my curiosity.
Here’s SA applied to Tamas_Papp’s example, using a random start value between -10 and 10. It converges to 0.0 every time I try it. Just an illustration that it works with discontinuous and nondifferentiable objectives:
julia> @time samin(f,[20.0*rand()-10.0], [-10.0], [10.0])
____________________________________________________________
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 975
Obj. value: 1.0000000000
parameter search width
-0.00000 0.00000
____________________________________________________________
0.002172 seconds (3.05 k allocations: 223.984 KiB)
And here it is applied to dpsander’s Rastrigin function. Getting it to work requires very slow cooling, and it takes longer than the interval optimization method.
julia> @time samin(rastrigin,12.0*rand(10).-6.0,-6.0*ones(10), 6.0*ones(10), nt=5, ns = 3, rt=0.999, maxevals=1e8)
____________________________________________________________
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 3781950
Obj. value: 0.0000000000
parameter search width
-0.00000 0.00000
-0.00000 0.00000
0.00000 0.00001
-0.00000 0.00000
0.00000 0.00000
0.00000 0.00001
-0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
-0.00000 0.00001
____________________________________________________________
1.642120 seconds (9.75 M allocations: 1.455 GiB, 12.27% gc tim
Thanks for exploring this. I wish I had more intuition about choosing a cooling schedule for SA. In practice, what I am always told is that if I fail to get to an optimum (which I know ex ante, in a testing context), then I was cooling too fast.
I think one advantage of using something a low-discrepancy sequence (in practice, Sobol) for exploration is a more or less uniform coverage. I wonder if it would make sense to combine SA with multistart methods.
I think another reason why SA is popular is that it’s based on solid theoretical grounds: the classic Metropolis-Hastings algorithm which was used back in the day to figure out the optimal spin configuration in a lsing model.
The issue is not so much the cooling schedule per se but rather the expensive objective calls. If objective calls are very expensive, one makes a compromise by setting a lower cooling schedule to “zoom-in” on the optimizer quickly.
That being said, there are newer versions of the basic SA algorithm like Adaptive Simulated Annealing which may also be useful to try in the above benchmarks.
I thought that interval methods have, in principle, no difficulty (?) with discontinuities per se.
They do have a problem with control flow, although we have been discussing how to mitigate that with Julia.
But then I tried a version of your function and it indeed doesn’t seem to work so well:
H(x::Real) = x <= 0 ? zero(x) : one(x)
function H(x::Interval{T}) where {T}
x.hi <= 0 && return zero(x)
x.lo > 0 && return one(x)
return Interval{T}(0, 1)
end
function f1(x)
return H(x - 6) * (x + 2) +
(1 - H(x - 6)) * (
(1 - H(x - 3)) * (x + 1)
+ (x - 3) * x
)
end
f1(3)
f(x) = sum(f1 ∘ abs, x)
B = IntervalBox(-1..1, 2)
minimise(f, B, tol=0.01)
gives a solution in 1 second (in 2 dimensions, with a very small box).
But in general there just seem to be too many pieces that it is unable to discard.