My scenario is that I’m solving some stiff ODE of O(100) variables with an ensemble of O(200) parameters. KenCarp4 or Rodas5 work the best that we’ve found. The problem is mildly sparse, but not super disconnected, and on CPU at least using sparse jacobian is slower due to overhead.
We’re working on getting this on GPU, but even using something like EnsembleProblem
to parallelize across the ensemble of 200 parameters leaves >95% of a modern GPU’s threads idle in a given timestep, so it seems like there’s huge potential for further speedup.
Are there stiff algorithms that further exploit this parallelism available? I know Runge-Kutta (not that we’re using it here) has parallel tableaus, but even those only really give opportunity for small factors speedups. Is there anything better? Given that at every time step for every parameter I can pretty much evaluate my ODE function at 100 extra values of t
and u
for free on a modern GPU, seems I must be able to reduce the number of time-steps needed? Or are there any other strategies one could use to saturate a GPU and speed this up? Thanks for any advice.