ODE solver algorithms that exploit massive parallelism (eg GPU)?

,

My scenario is that I’m solving some stiff ODE of O(100) variables with an ensemble of O(200) parameters. KenCarp4 or Rodas5 work the best that we’ve found. The problem is mildly sparse, but not super disconnected, and on CPU at least using sparse jacobian is slower due to overhead.

We’re working on getting this on GPU, but even using something like EnsembleProblem to parallelize across the ensemble of 200 parameters leaves >95% of a modern GPU’s threads idle in a given timestep, so it seems like there’s huge potential for further speedup.

Are there stiff algorithms that further exploit this parallelism available? I know Runge-Kutta (not that we’re using it here) has parallel tableaus, but even those only really give opportunity for small factors speedups. Is there anything better? Given that at every time step for every parameter I can pretty much evaluate my ODE function at 100 extra values of t and u for free on a modern GPU, seems I must be able to reduce the number of time-steps needed? Or are there any other strategies one could use to saturate a GPU and speed this up? Thanks for any advice.

2 Likes

https://github.com/SciML/DiffEqGPU.jl/pull/148 might be of interest? Although not a stiff solver.

1 Like

Yeah that’s the direction we’re heading. It’s an improvement of over 100x for the non-stiff case. We need to do it for the stiff ODE case too.

3 Likes

Thanks for the replies. Is there any reference for what these GPU versions of these methods do exactly?

Not right now. They are pre pre paper