The real question is, why include Nelder–Mead? Why use it? It’s an obsolete algorithm that is not guaranteed to converge, and there are plenty of other derivative-free local-optimization algorithms these days. There isn’t an easy check that ensures that it is globally convergent (i.e. always converges to some local minimum, not necessarily to a global minimum) — there are globally convergent Nelder–Mead variants, but they involve much more substantial changes IIRC.
I included it in NLopt, but really only for comparison purposes, and I recommend that people only use it for this purpose. I don’t know why Optim.jl uses it by default for gradient-free problems.
What “Powell-like” routine are you referring to? I don’t see one in Optim.jl? (When you say “100 times slower”, do you mean that it converges in 100x more iterations, or that each iteration is 100x slower?)