Best way to improve local minimas in nonlinear optimization?

What is the sentiment about this method?

It looks cool, might be slow though. I know there was a sequel that has combined it with bayesian optimization. Can find it if someone would be interested in.

2 Likes

Hyper-parameter optimisation typically relies on derivative-free optimisation algorithms. Asking for an algorithm (especially derivative free) that can handle optimising many hyper-parameters for almost any problem without having many or any hyper-parameters of its own is a bit unrealistic in my view. You can pretty much always formulate worst case problems for any algorithm to make it behave pathologically compared to other algorithms. Even gradient based solvers typically require a fair bit of fiddling with to get their best hyper-parameters. So this problem sounds a lot like: I want the best optimisation algorithm that works well all the time with very little effort on my side. If such an algorithm existed, people would have stopped studying optimisation long time ago :slight_smile: That’s not saying the method linked above is bad but it will have to be more thoroughly benchmarked against other algorithms in the same class (which are many) for a wide range of problems to be named a winner if a winner can even be named.

This algorithm (AdaLIPO) looks very similar in spirit to the DIRECT algorithm, albeit with a more stochastic approach, and indeed on their test problems it is usually within a factor of 2 of DIRECT. Unfortunately this also means that it is probably only practical in low dimensions.

1 Like

DIRECT is not without problems. Ref 14 of the paper @stevengj points to explains one of them. Having said that, I’ve had decent luck with DIRECT on a few problems (see ref 5 in the paper). In the limit, DIRECT explores all of design space, so it is (as D. Jones, the inventor says) a global algorithm. That is not great news, as randomly throwing darts will do the same thing.

If you understand that the goal of algorithms like this is to find a better point that what you have, rather than spend far too much time to obtain a true global optimum, algorithms like this can make your life better.

There are a few papers that claim DIRECT is useful for large problems, but I’m not convinced.

Yes, but DIRECT tends to be much faster than pure random search and also helps ensure solution diversity — it essentially prioritizes searching not-yet-visited regions that have a combination of large diameter and low function value.

Correct. DIRECT is one of many methods that perform better than theory predicts.

I don’t know what predictions you’re referring to. For general objective functions, AFAIK theory can only predict whether an algorithm converges and certain asymptotic properties — even for local optimization, you can devise problems where the algorithm takes arbitrarily long to converge (you only get stronger results about local-optimization convergence rate asymptotically when the objective is approximately quadratic, or for other special cases like LP etc.).

Methods such as Anderson acceleration ,for example, often do better than the asymptotic theory predicts. Asymptotic theory is not intended to descript the midrange of the iteration where a lot of good things can happen. Quasi-Newton methods with line searches are another example, as is the good performance of DIRECT (when it’s happy). The Nelder-Mead algorithm, for which there is very little theory, is yet another.

Right, you have in mind examples of algorithms doing well in cases where the theory makes no predictions. To me, this is not “doing better than the theory” because there is no theory.

My Ax=b constraint was just a toy example. Our actual problem is more complex. Do you have a recommendation for a global optimization algorithm that’ll handle equality constraints? My linear algebra is a bit rusty, I’ll look into elimination procedure if there isn’t an easier option.