I learned this the hard way in that sort(x, by = _ -> rand())
is not a shuffler, i.e. x
is not random enough after this shuffle.
Using Random.shuffle
is the correct way it seems.
But why is sort(x, by = _ -> rand())
not that great at shuffling?
is it cos rand()
runs too fast so the same random number get for some successive numbers? This can’t be the case since doesn’t every run of rand()
generate a different number if the random seed is not reset.
res = mapreduce(vcat, 1:1000) do _
reshape(sort(1:16, by = _ -> rand()), 1, :)
end
mean(res[:, 1]) # 2.5
res2 = mapreduce(vcat, 1:1000) do _
reshape(shuffle(1:16), 1, :)
end
mean(res2[:, 1]) # 8.342
To see the effect, consider the above where I shuffled the numbers 1:16
using the 2 methods and calculated the mean of the first number. Clearning the first method is too low, meaning not enough small number get shuffled to the end.