Output distribution of rand(Float32) and rand(Float64), thread 2

Hah, I was under the same impression. This is wrong, though, and @maleadt corrected me in Kernel random numbers generation entropy / randomness issues

So CPU and GPU currently use different algorithms (xoshiro vs philox) to arrive at the same (questionable) distribution for unqualified rand().

Divergence between distributions would make it harder to port code from CPU to GPU without possibly introducing very subtle accuracy bugs. Especially it would be harder to test code for correctness and accuracy on CPU before deploying to GPU, for people who are unaware of the entire issue.

This is a real problem!

But we must swallow some frog, we’re only haggling about which one is more palatable. I guess divergence is grudgingly acceptable. But I’m not a GPU person, so my opinion on that is not qualified enough to be taken seriously.

(the two frogs would be: Throw the 1/rand(T) people under the bus immediately on CPU, or wait until they have tested their algorithms to satisfaction and deploy to GPU to present them a little surprise debugging puzzle)