FWIW, that’s just using a method for stiff ODEs on the same gradient flow, since the Jacobian of the gradient gives you the Hessian which is then used to step. This means Rosenbrock-W methods are quite close to doing exactly what you described. But yes, the issue is that their adaptivity is doing the wrong thing, since it assumes you really care about following the trajectory instead of finding the minimum value.
2 Likes