I am confused about how to put bounds on parameters using Nelder-Mead in the Optim.jl package. I somehow remember Nelder-Mead should not be used with Fminbox, so I wonder if the following code is correct? Also, I notice that the package NLopt.jl also provides Nelder-Mead algorithm, I wonder if they are the same or which one is better? Thank you.
One option is to simply return a large value for the objective when the constraints are violated. I have not seen this “officially” recommended anywhere, but I have used this trick quite a few times without obvious trouble.
The NLopt version of Nelder-Mead implements a different method of imposing constraints (it moves points outside of the constraints to the inside). Whether this is any better than Optim.jl I do not know.
A long, perhaps off-topic discussion could start from here.
Among economists doing structural work, N-M is still quite popular. Whether this is a good choice or not, is hard to say. Each problem has its own issues and the question “what is the best algorithm?” probably does not have a good answer (unless it is a problem specific one).
One interesting reference for structural economic problems is
Arnoud, Antoine, Fatih Guvenen, and Tatjana Kleineberg. 2019. “Benchmarking Global Optimizers.” Working Paper 26340. National Bureau of Economic Research. https://doi.org/10.3386/w26340.
Sorry - I could not resist going off topic for a bit. I am certainly not an expert on this and would welcome suggestions that I could try on my problems.
What I end up doing is actually building a sort of customized global optimizer for each problem. It first explores the parameter space globally, but only computes a subset of the target moments (or cheap approximations of the target moments). Then it solves several hundred of the “best” points according to this initial search. And finally it runs a local optimizer (usually N-M) on the best point resulting from that search.
It seems to work well in the problems where I have tried it. Of course, I have no way of knowing whether I found (a point close to) the global optimum.
That sounds similar to a “multistart” algorithm (for which my favorite is MLSL). However, if only only do a local optimizer on one point at the end, and you filter the initial points heuristically, you are not guaranteed to get a global optimum in general.
As for Nelder–Mead (which has a big theoretical problem: it is not even guaranteed to converge to a local optimum), I’ve personally never found a case where it was the best derivative-free local search algorithm, compared to the other derivative-free algorithms in NLopt. YMMV, of course.
(The paper you cite found N–M to be the best, but they only compared it to two other local-optimization algorithms I wouldn’t typically reach to as my first choices. Indeed, one was the derivative-based algorithm DFPMIN from Numerical Recipes that they presumably used with finite-difference gradients(?), although they don’t explain how they used it. They are also comparing local-search algorithms on global optimization problems, which is kind of weird and it’s not clear to me how such a comparison is useful.)
Nelder-Mead is a great algorithm for unconstrained problems when conventional gradient-based methods are not applicable. When the algorithm works well, it’s really amazing.
It is not a great algorithm otherwise. I tell the students in my optimization class over and over that using something like Nelder-Mead because one is too lazy to compute a gradient is an excellent way to spend an extra year in graduate school. Many of then do not heed this wise advice.
There are other derivative-free methods that easily deal with bound constraints.
But there are lots of derivative-free methods for unconstrained optimization. Nelder–Mead is far from the only choice, and has the big drawback that it may not converge at all; the main advantage of Nelder–Mead (from my perspective) is that it is easy to implement if you are writing it yourself.
(I agree that it is amazing — it’s geometrically delightful to visualize the simplices “walking” through space.)
There are some variations of the Nelder-Mead method that are better. For example, this routine (Fortran90, implemented by D. E. Shaw) fits a quadratic surface to the points to chose the step size. That is a rational choice and improves a lot the method - I guess that NM with that improvement turns out to be a very robust alternative for minimization without derivatives. I do not know if the NM method implemented in the packages available here use the same strategy.
If you’re going to use quadratic fits, I would use something like Powell’s BOBYQA algorithm or Andy Conn’s variants from his DFO book, which are BFGS-like and have reasonable convergence guarantees.
(The basic issue is that the points on a simplex aren’t enough data to determine a quadratic fit — you can only do a linear fit, which is what Powell’s COBYLA algorithm does. You can only get a quadratic fit by iterative updates with some form of regularization, e.g. the BFGS-like minimum-norm-change proposed by Powell. And if you are doing fits for your update I wouldn’t call it a “Nelder–Mead” variant anyway.)
How so? My point is that you are usually better off with some more modern local optimization algorithm if you want to do local optimization. Comparing it to a global optimization algorithm, which will necessarily be much slower than local optimization, is irrelevant.
Thanks! Can you point me to some of those modern local optimization algorithms? And their implementations in Julia maybe? I am searching for such an algorithm because my function is very expensive to evaluate and I am not sure how to differentiate it. I don’t think automatic diff will work either. It is essentially a field-based calculation using mean-field self-consistent method to obtain the equilibrium structure of a density field. The energy is to be optimized with respect to the density distribution but it also has to be optimized with cell sizes (cell can be non-orthorhombic). I have a very good algorithm to locate the energy minimum at a fixed cell shape. Now the problem reduces to find the final energy minimum by varying 6 parameters of the cell shape (3 side lengths and 3 angles). By considering symmetry, sometimes the number of parameters can be reduced. For example, cubic cell shape only has one parameter to be optimized.
Thank you for that feedback. I understand the risk of boxing myself into a local optimum with my algorithm. It is definitely a concern.
I will try the NLopt derivative free algorithms that you suggest for the local search.
Thank you for all the comments, which are very helpful.
My objective function is non-differentiable and computational burdensome, which involves solving thousands of linear programming problems in each iteration. I find the differential evolution algorithm in BlackBoxOptim.jl to be the best fit. However, once I increase the number of parameters in the objective function, the D.E. algorithm can no longer find the optimum. It somehow gets stuck at some point. I tried most of the gradient-free algorithms available in Julia, but no one works. So I borrow the idea from “coordinate descent” and this seems to be promising. Basically, I successively optimize on a subset of the parameters while fixing other parameters. Since BlackBoxOptim.jl does not support the starting point, it becomes inefficient for my routine. That’s why I am trying the N-M algorithm.
Non-differentiability unfortunately excludes a lot of possible algorithms (and the fastest-converging ones). Can you reformulate it as a differentiable objective with a lot of linear or nonlinear constraints?