I mean call your cost function (in this case cost_twostage) with two arguments. The one you synthesised using DiffEqParamEstim. If you call it with two arguments, it mutates the second argument to give the gradient. Which gives an idea of how long each individual gradient calculation is taking
If even the two stage method is resulting in long times, then the gradient might be numerically bad. So the optimisation algorithm gets confused/makes bad steps and takes ages to settle. Why are you setting mpg_autodiff = false? That reverts to a finite difference gradient, which can be really inaccurate in the type of problem you have.
Sorry I haven’t used the two stage method of the package before. I implemented my own (almost surely worse) version of it a while ago. It will always be really quick as you’re not differentiating through the ODE solution to get the gradient.