Why not mix them? I didn’t say use the ROCK method directly, I said use the ideas from it. I haven’t found momentum based methods stable enough on any of the problems I’ve tried, but a ROCK type idea would remedy that by linking condition number estimate to stages and automatically choose a stage number to ensure each step is stable. If you do that on an ODE with momentum you can probably get the stabilization provided by traditional momentum based methods but be able to handle much higher condition numbers.
FWIW the instability of momentum based methods on highly ill conditioned problems was causing us a ton of problems recently, so we started looking into alternatives. The solution we are using now is Hessian-free Newton Krylov (which is why all of the differential equation solvers have second order sensitivity analysis for Hv products), but I am not convinced that momentum methods can’t be tweaked to be more stable, just ADAM, RMSProp, Nosterov, etc. didn’t seem to do it (BFGS is tricky as well because of the initial Hessian). Though I haven’t looked into whether there is anything that mixes proximal optimization, Hessian vector products, and momentum methods specifically to target ill conditioned problems, but I’d anyone has anything for this I’d be curious to get it setup with DiffEqFlux.