I have a kind of hard nonlinear optimization problem. 12 variables, I know the result of the function should be zero, but how to find the combination of 12 values that give a very low residual?
So far I tried Optim.jl and NLopt.jl. BFGS(linesearch=LineSearches.BackTracking(order=3)) gives the fastest result, but it is not accurate.
NLopt with :LN_BOBYQA works better, but it is very slow, and it also does not always converge.
So my preferred optimizer would have the following properties:
do global optimization if it is stuck at a local minima, but only then
use gradient based search otherwise
use multiple cores
terminate only if the norm of the residual is below a given threshold
One always valid thing to try is a simple multistart approach. Meaning, run a gradient-based local optimizer multiple times starting from different initial random guesses, and keep the best solution. That is of course embarrassing parallel.
And it is possible to get a feeling about if the global minimizer was found by counting the number of times the best solution was found.
For 12 variables I would use a smarter multistart approach like MLSL that has heuristics to avoid repeatedly searching the same local optimum. (In higher dimensions these techniques don’t work so well.)
Note that the tolerance of the local optimizer is a crucial hyper-parameter in a multistart algorithm. You want it low enough that it gives a decent estimate for the local optimum but not so low that it wastes a lot of time trying to get poor local optima very accurately. You can always “polish” the solution afterwards by re-running a local optimizer to lower tolerance on the best local optimizer.
Sounds like it (since “I know the result of the function should be zero”) — it’s usually better to use a root-finding algorithm (e.g. multidimensional Newton) directly in such cases.
I do not disagree, but I think that algorithms that do not take advantage of some structure of the problem to guide the search always fall short on their promesses. When the surface is simple, all of them work, and when it is too rough, none work. Usually I start with a simple random search and if that is not enough I try to understand the bounds and properties of each variable to design a smarter search for the specific function I am dealing with (of course in Chemistry this many times has a very clear structural meaning from which one can get insights).
Probably the important message is to always couple a global search with a gradient based local optimizer if that is possible. Your notes in the choice of the convergence parameters for that are perfect.
I tried to exploit what I know about the problem. In particular the sum of the length of the segments is very much constraint, lets say between l_min and 1.002 * l_min. But I did not find an elegant way to exploit that knowledge yet.
Very accurate, very fast. I use a straight line as starting point and then let the solver modify the x and y values until it finds an equilibrium. I have to change the positions and use accelerations and velocities as residual, which was not so clear to me in the beginning. Therefore I used the norm of the residual first. But now I have a perfect solution, thank you!