In a model for biology I used a code written in f90 that worked fine, but for greater readability by the future readers of the corresponding paper, I translated it into Julia. In my f90 code, I used minimization routines by Nelder-Mead and Powell (Powell was actually better) of the 1960-70 to determine the parameters of the model.
But the corresponding routines from your package Optim.jl disappointed me :
the Nelder-Mead routine is fast enough, but it does not check whether the output is a true minimum, so it sometimes gave incorrect results -
the Powell like minimization routines (without gradients) of Julia are at least 100 times slower than the f90 version and were too slow to be useful for me.
Here are my questions :
Why did you not include a check of the minimum property, as in early fortran Nelder-Mead codes? The Nelder-Mead code was quite popular in the past, most likely because it keeps searching - within prescribed limits - until a true minimum is found.
Why are your Powell like non gradient algorithms so slow ?
Searching for model parameters by minimizing the norm of a discrepancy between theory and experiment is a standard procedure and I suspect that other people that tried Julia have been put off by the difficulties that I encountered.
Thank you in advance for your comments !
The real question is, why include Nelder–Mead? Why use it? It’s an obsolete algorithm that is not guaranteed to converge, and there are plenty of other derivative-free local-optimization algorithms these days. There isn’t an easy check that ensures that it is globally convergent (i.e. always converges to some local minimum, not necessarily to a global minimum) — there are globally convergent Nelder–Mead variants, but they involve much more substantial changes IIRC.
I included it in NLopt, but really only for comparison purposes, and I recommend that people only use it for this purpose. I don’t know why Optim.jl uses it by default for gradient-free problems.
What “Powell-like” routine are you referring to? I don’t see one in Optim.jl? (When you say “100 times slower”, do you mean that it converges in 100x more iterations, or that each iteration is 100x slower?)
I’m curious what the Nelder-Mead Fortran library you refer to is actually doing when it “check[s] whether the output is a true minimum.” For nonconvex optimization, finding (and verifying) global optimality is a difficult problem. I suspect that what the code is actually doing is checking whether it is at a local minimum.
Even that is not so easy in a derivative-free algorithm. You could use finite differences, but that involves 2n more function evaluations in n dimensions, and its reliability is limited by a choice of step size.
The reason why you might want such a check in Nelder–Mead is that the Nelder–Mead algorithm is known to be flawed: it might not converge to a local minimum. That’s why I say it is obsolete and should typically be avoided.
I made it the default because R did so. But I would love to see other derivative-free algorithms in Optim replace it.
I would love to include Powell-style algorithms. Just havn’t had the time. I bought https://epubs.siam.org/doi/book/10.1137/1.9780898718768 but never really found time to devote to the topic.
Good point. So what does the Fortran library do?
This is more annedoctal than anything, but BFGS methods can also fail to converge (and are pretty much the standard methods for large problems because are good enough in practice): The BFGS method with exact line searches fails for non-convex objective functions | SpringerLink.
It’s been known for a couple of decades how to ensure that BFGS is globally convergent on nonconvex problems (Li and Fukushima, 2001).
Yes sure, but the Nelder Mead result is also restricted to a standard implementation, as you mentioned above (I don’t know what exactly is implemented in Optim.jl). I was just pointing that such theoretical results may not be fundamental in practice (for their exceptionality or for the actual implementations of such methods being already variants that deal with such issues).
(And by that I don’t mean that the OP should be using it instead of a better method)
I disagree. Sometimes one needs a certain guarantee that a local minimum is found (or a saddle, or whatever you’re looking for).
Yes, of course. But Nelder Mead variants with convergence guarantees also exist.
(The Nelder Mead implemented in Optim.jl is a more modern improvement, indeed. It doesn’t seem to have convergence results, though, but behaves better than the classical method)
I agree. For my particular problem, NM actually delivers the best results.
I think we should go back to this question, because it seems very likely that someone translating a code from Fortran might have done some Fortran style things that would absolutely hammer the Julia performance, like for example using global variables.
Without an example from Op it is pure guesswork.
Cool, I had not been able to find implementations with bsd or mit licenses before