Simulation running well on K620 but instable on A100?

Yeah, don’t do that :slightly_smiling_face: A quick fix is to add CUDA.@atomic in front of those additions. That will make it slower, though. A better solution is to compute interim sums, aggregate those across the block, and then perform atomic additions at the grid level; but that’s of course much more invasive. Depending on the exact characteristics, CUDA.@atomic might perform well enough for you.

1 Like