I learned this the hard way in that
sort(x, by = _ -> rand()) is not a shuffler, i.e.
x is not random enough after this shuffle.
Random.shuffle is the correct way it seems.
But why is
sort(x, by = _ -> rand()) not that great at shuffling?
is it cos
rand() runs too fast so the same random number get for some successive numbers? This can’t be the case since doesn’t every run of
rand() generate a different number if the random seed is not reset.
res = mapreduce(vcat, 1:1000) do _ reshape(sort(1:16, by = _ -> rand()), 1, :) end mean(res[:, 1]) # 2.5 res2 = mapreduce(vcat, 1:1000) do _ reshape(shuffle(1:16), 1, :) end mean(res2[:, 1]) # 8.342
To see the effect, consider the above where I shuffled the numbers
1:16 using the 2 methods and calculated the mean of the first number. Clearning the first method is too low, meaning not enough small number get shuffled to the end.