In a model for biology I used a code written in f90 that worked fine, but for greater readability by the future readers of the corresponding paper, I translated it into Julia. In my f90 code, I used minimization routines by Nelder-Mead and Powell (Powell was actually better) of the 1960-70 to determine the parameters of the model.
But the corresponding routines from your package Optim.jl disappointed me :
the Nelder-Mead routine is fast enough, but it does not check whether the output is a true minimum, so it sometimes gave incorrect results -
the Powell like minimization routines (without gradients) of Julia are at least 100 times slower than the f90 version and were too slow to be useful for me.
Here are my questions :
Why did you not include a check of the minimum property, as in early fortran Nelder-Mead codes? The Nelder-Mead code was quite popular in the past, most likely because it keeps searching - within prescribed limits - until a true minimum is found.
Why are your Powell like non gradient algorithms so slow ?
Searching for model parameters by minimizing the norm of a discrepancy between theory and experiment is a standard procedure and I suspect that other people that tried Julia have been put off by the difficulties that I encountered.
The real question is, why include Nelder–Mead? Why use it? It’s an obsolete algorithm that is not guaranteed to converge, and there are plenty of other derivative-free local-optimization algorithms these days. There isn’t an easy check that ensures that it is globally convergent (i.e. always converges to some local minimum, not necessarily to a global minimum) — there are globally convergent Nelder–Mead variants, but they involve much more substantial changes IIRC.
I included it in NLopt, but really only for comparison purposes, and I recommend that people only use it for this purpose. I don’t know why Optim.jl uses it by default for gradient-free problems.
What “Powell-like” routine are you referring to? I don’t see one in Optim.jl? (When you say “100 times slower”, do you mean that it converges in 100x more iterations, or that each iteration is 100x slower?)
I’m curious what the Nelder-Mead Fortran library you refer to is actually doing when it “check[s] whether the output is a true minimum.” For nonconvex optimization, finding (and verifying) global optimality is a difficult problem. I suspect that what the code is actually doing is checking whether it is at a local minimum.
Even that is not so easy in a derivative-free algorithm. You could use finite differences, but that involves 2n more function evaluations in n dimensions, and its reliability is limited by a choice of step size.
The reason why you might want such a check in Nelder–Mead is that the Nelder–Mead algorithm is known to be flawed: it might not converge to a local minimum. That’s why I say it is obsolete and should typically be avoided.
Yes sure, but the Nelder Mead result is also restricted to a standard implementation, as you mentioned above (I don’t know what exactly is implemented in Optim.jl). I was just pointing that such theoretical results may not be fundamental in practice (for their exceptionality or for the actual implementations of such methods being already variants that deal with such issues).
(And by that I don’t mean that the OP should be using it instead of a better method)
Yes, of course. But Nelder Mead variants with convergence guarantees also exist.
(The Nelder Mead implemented in Optim.jl is a more modern improvement, indeed. It doesn’t seem to have convergence results, though, but behaves better than the classical method)
I think we should go back to this question, because it seems very likely that someone translating a code from Fortran might have done some Fortran style things that would absolutely hammer the Julia performance, like for example using global variables.
Thank @lmiq and @pnavaro for the pointers to PRIMA. I hope it will be useful to the Julia community.
As a maintainer of Powell’s solvers, I am delighted that you @pkofod are interested in these methods. However, I would suggest never implementing these methods from scratch by only looking at the book you mentioned or any other literature. The implementation of these methods is notorious HARD. Powell mentioned that “The development of NEWUOA has taken nearly three years. The work was very frustrating”. For more elaboration on this particular point, you may refer to my talk at The 10th International Congress on Industrial and Applied Mathematics (check the slide titled “Implementation of these methods is HARD” and those after).
Unless you want to spend many frustrating years as Powell (and I) did on implementing these methods, it is better to use some existing implementation as a reference. The very first objective of PRIMA is to provide such a reference implementation.
A native Julia implementation of these methods is highly desirable. It is within the plan of PRIMA. It would be great if someone from the Julia community gets sufficiently interested to take the initiative.