This looks like a misunderstanding of how to seed an RNG. You are setting the seed to the same index-dependent value every time you invoke the kernel, which means you’re not generating random numbers, you’re producing the same numbers over and over (or you’re doing something less predictable, it’s not entirely clear to me if setting a different seed per thread like you’re attempting here is well-defined for the device RNG). I think this explains the unexpected results you’re seeing in your rand() + seed
experiments.
If you want to seed the RNG manually, you should do it once at the start of your computation. Perhaps @maleadt could provide more clarity about the right way to do this? Does CUDA.default_rng()
take a single global seed or are there independent streams with independent seeds, say, one per thread in a warp?