DiffEqGPU Trajectory Failure Handling and Heterogeneous Trajectories

tosm01 · July 22, 2025, 12:06pm

An update on this.

Regarding the fallback to CPU for batches of trajectories that take too long to run on the GPU, I found it’s more practical to handle this at the Bash level by setting a maximum wall-clock time. If the time limit is exceeded, I switch to running the batch on the CPU instead.

As for the failure, I initially suspected a divide-by-zero error, but I put in checks to handle that. It now seems more likely to be an overflow issue, especially since I get the same error when I multiply dx by a very large value in the equation. I’m not entirely sure what causes the error, but using the CPU fallback has been an effective workaround.

Just thought I’d share this as a practical solution when working with EnsembleGPUKernel. There are definitely cases where the GPU is significantly faster than the CPU, but also cases where the CPU performs better.

That said, this fallback strategy can actually outperform relying solely on either the GPU or the CPU, especially when the probability of encountering problematic trajectories is low. The GPU will burst through those easy to solve ones while the problematic ones are offloaded to the CPU.

Topic		Replies	Views
DiffEqGPU - slow parallel solving of SDEs on GPU GPU	6	494	March 3, 2024
Why does EnsembleGPUArray not save at the given time points the solution? GPU	1	383	June 4, 2022
Is it possible to unsynch the EnsembleGPUArrays? GPU	6	537	May 20, 2022
Performance of Ensemble Simulations on GPUs Modelling & Simulations	4	701	May 28, 2021
Using DiffEqGPU for very large systems: size app 1e7-1e8 Modelling & Simulations diffeq	2	146	January 23, 2026

DiffEqGPU Trajectory Failure Handling and Heterogeneous Trajectories

Related topics