Why is `sort(x, by = _ -> rand())` not a good shuffler?

xiaodai · February 19, 2022, 11:56am

I learned this the hard way in that sort(x, by = _ -> rand()) is not a shuffler, i.e. x is not random enough after this shuffle.

Using Random.shuffle is the correct way it seems.

But why is sort(x, by = _ -> rand()) not that great at shuffling?

is it cos rand() runs too fast so the same random number get for some successive numbers? This can’t be the case since doesn’t every run of rand() generate a different number if the random seed is not reset.

res = mapreduce(vcat, 1:1000) do _

    reshape(sort(1:16, by = _ -> rand()), 1, :)

end

mean(res[:, 1]) # 2.5

res2 = mapreduce(vcat, 1:1000) do _

    reshape(shuffle(1:16), 1, :)

end

mean(res2[:, 1]) # 8.342

To see the effect, consider the above where I shuffled the numbers 1:16 using the 2 methods and calculated the mean of the first number. Clearning the first method is too low, meaning not enough small number get shuffled to the end.

Dictino · February 19, 2022, 12:13pm

I guess that the problem is precisely that the result of rand is changing on each run.

If you create a random column and sort by this colum It must be fine.

But as rand gives you something new on each comparison if the Sort algorithm, the element is doing a “random walk” sometimes it goes to the front and sometimes ir goes back and on average It ends more ore less where It starts

xiaodai · February 19, 2022, 12:15pm

That makes sense, since the rand() is run every time and not fixed. I had thought for some reason taht the random number only get generated once.

Topic		Replies	Views
Why does shuffling rows change the estimates? General Usage question , turing	3	437	December 25, 2020
Shuffle columns of a matrix General Usage question	8	93	June 30, 2025
Random number generation General Usage question , random	19	1561	February 10, 2024
Package test (based on shuffle) ok locally and on GitHub CI but failing on PkgEval Package Management	6	243	December 8, 2023
Why Random.seed!(123) don't produce the same result? General Usage question , random	2	378	May 14, 2023

Why is `sort(x, by = _ -> rand())` not a good shuffler?

Related topics