As the title suggests, I’m currently wondering if Accelerated Gradient Descent and Momentum Gradient Descent should be (re)moved from Optim.
“Momentum methods” certainly have their use, and machine learning seems to have adopted them to avoid getting stuck at saddle points/local minima, but I’m not so sure that the two methods are the best fit for Optim, at least not in their current form, but maybe not even in Optim in any form. We don’t really do SGD-type methods, and I doubt many are using Optim in the same situations where AGD and MGD are popular.
So, this post is just to get some feedback, and to see if twenty people reply: “Don’t remove it! I use it all the time!” or not. Should we remove them from Optim, they would go into another package, see https://github.com/JuliaNLSolvers/LegacyOptim.jl for how that might look. Of course, they would stay for a major version release cycle with proper deprecation warnings for their constructors, etc.