An Idea for Optimization as if Solving ODE

Thanks for the review paper. But the early reactions supposed that I don’t know the field or just barely getting started. I know mostly how it evolved or that I should not use the steepest descent everywhere. My example is just a naive example for easiness and fast prototyping. I myself coded for quasi-Newton methods to some extent while taking optimization course but, of course, there are plenty of places for improving myself.

For the abondenment of SD, we can still make du out of conjugate-gradient method or damped/momentum dynamics method. I just wanted to explore the idea of incorporating an ODE solver in between and get feedback on the research done similarly or some pointing to papers about it, not plain objections without concerete references as the first couple of responds.

I still think that utilizing ideas from ROCK-type methods (extensions to Runge-Kutta Chebyshev for better stability in the complex plane) and other methods utilized for stabilizing stiff ODEs could be helpful, since I haven’t really found methods that are very effective for highly ill-conditioned local optimization problems where a nonlinear preconditioner is hard to specify. That said, since you don’t care about “accuracy”, you might as well use a low order method, which then means that implicit Euler or Rosenbrock Euler methods might be the things to look at. And indeed, if you search the literature, you find recent research in proximal optimization methods have a lot of links to implicit Euler:

1 Like

Can ROCK really beat optimised momentum descent, heavy ball, and relatives?

Why not mix them? I didn’t say use the ROCK method directly, I said use the ideas from it. I haven’t found momentum based methods stable enough on any of the problems I’ve tried, but a ROCK type idea would remedy that by linking condition number estimate to stages and automatically choose a stage number to ensure each step is stable. If you do that on an ODE with momentum you can probably get the stabilization provided by traditional momentum based methods but be able to handle much higher condition numbers.

FWIW the instability of momentum based methods on highly ill conditioned problems was causing us a ton of problems recently, so we started looking into alternatives. The solution we are using now is Hessian-free Newton Krylov (which is why all of the differential equation solvers have second order sensitivity analysis for Hv products), but I am not convinced that momentum methods can’t be tweaked to be more stable, just ADAM, RMSProp, Nosterov, etc. didn’t seem to do it (BFGS is tricky as well because of the initial Hessian). Though I haven’t looked into whether there is anything that mixes proximal optimization, Hessian vector products, and momentum methods specifically to target ill conditioned problems, but I’d anyone has anything for this I’d be curious to get it setup with DiffEqFlux.

For DiffEqFlux, it might resemble adaptive control. Instead of finding cost at the final time. We could update parameters as if they are also states of the ODE along with normal states. Parameter derivatives can be from any of the optimizers. It might work or find locally acceptable parameters in each time step.

I think a number of proximal algorithms are analogous to a number of DiffEq solvers. To use the Hessian or approximate Hessian, I imagine one can take a quadratic approximation of the function or constraint, model that as a second-order conic constraint then use a projection-based proximal algorithm like ADMM from https://github.com/oxfordcontrol/COSMO.jl. If you are careful, you may not need to construct the Hessian at all, just use an operator.