# Why is `sort(x, by = _ -> rand())` not a good shuffler?

I learned this the hard way in that `sort(x, by = _ -> rand())` is not a shuffler, i.e. `x` is not random enough after this shuffle.

Using `Random.shuffle` is the correct way it seems.

But why is `sort(x, by = _ -> rand())` not that great at shuffling?

is it cos `rand()` runs too fast so the same random number get for some successive numbers? This can’t be the case since doesn’t every run of `rand()` generate a different number if the random seed is not reset.

``````res = mapreduce(vcat, 1:1000) do _

reshape(sort(1:16, by = _ -> rand()), 1, :)

end

mean(res[:, 1]) # 2.5

res2 = mapreduce(vcat, 1:1000) do _

reshape(shuffle(1:16), 1, :)

end

mean(res2[:, 1]) # 8.342
``````

To see the effect, consider the above where I shuffled the numbers `1:16` using the 2 methods and calculated the mean of the first number. Clearning the first method is too low, meaning not enough small number get shuffled to the end.

I guess that the problem is precisely that the result of rand is changing on each run.

If you create a random column and sort by this colum It must be fine.

But as rand gives you something new on each comparison if the Sort algorithm, the element is doing a “random walk” sometimes it goes to the front and sometimes ir goes back and on average It ends more ore less where It starts

2 Likes

That makes sense, since the `rand()` is run every time and not fixed. I had thought for some reason taht the random number only get generated once.