Having read through this thread again, I’ve noticed that in basically all benchmarks posted here the setup looked something like this:
list = rand(n)
.
.
.
@benchmark [...] setup=(x=copy(list)) # or deepcopy(list)
This will skew the benchmark heavily, since it’s always the same list that is being sorted and the branch predictor in your CPU will learn the patterns in that data.
I don’t think the benchmarks posted here support the conclusions drawn because of that fact alone (not to mention that they also still don’t account for the code being placed in different memory locations, which may influence things as well). Statistics over statistics are meaningless if the statistics you’ve collected are wonky in the first place.