Sorting instability issue

Hello, everyone, I work with Julia Version 1.10.2 and want to ask a question about the sorting function in Julia.

Suppose I run the following code:

time_sort = zeros(100,1);
for i = 1:100
n = 10^6
x0 = rand(n);
time_sort[i] = @elapsed x0sort = sort(x0, rev=true);
end

The result shows that
julia> time_sort
100x1 MatrixfFloat64}:
0.101048458
0.120940708
0.108399541
0.097998959
0.090954625
0.096979917
0.1142765
0.099838583
0.097053875
0.092064458
…
0.009552855
0.009860167
0.009679833
0.009623333
0.009937875
0.009491167
0.0097165
0.009016333
0.009795333
0.009923958

The first 20 elements in time_sort are about 0.1s. However, the final elements are about 0.01s. The time difference between these two is about 10 times, while they are running the same code. I am wondering why this happens. Does someone know that?

I can’t reproduce this:

julia> using UnicodePlots

julia> scatterplot([@elapsed sort(rand(10^6), rev=true) for _ ∈ 1:100])
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   0.046 β”‚β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β  β €β €β €β‘€β’€β ‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β €β €β „β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β €β €β €β β €β €β ˆβ β €β €β €β €β €β €β €β €β €β „β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
         β”‚β €β‘€β ˆβ €β €β „β €β‘€β ‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β  β €β €β”‚
         │⠀⒀⑀⠁⠠⠀⠀⠀⠠⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⒀⠀⠁⠀⠀⠀⠄⑄⒀│
         β”‚β €β €β €β  β €β €β ˆβ €β €β β ”β’‚β „β‘β£„β €β‘”β‘€β£€β ”β‘”β’β €β ”β ’β Šβ‘„β’„β €β „β’„β‘€β Œβ  β •β ‘β Œβ €β β’„β”‚
   0.037 β”‚β €β €β €β €β €β €β €β €β €β ˆβ β €β ‰β €β €β €β €β ˆβ €β €β €β €β €β €β €β €β ˆβ €β β ˆβ €β €β €β €β €β €β €β €β €β €β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β €0β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €100β €

There’s slightly higher runtime for the first execution and some general noise later on, which is why people generally benchmark with BenchmarkTools or Chairmarks to get more reliable results.

Also,

Any specifice reason for this? Current release is 1.11.2, and LTS is 1.10.7, so there isn’t really a reason to be on 1.10.2.

I have no idea about this. I can’t reproduce the strange result now. Now when I run that code again, it is basically around 0.01s, instead of 0.1s.

By the way, what is the default algorithm for sorting in Julia? I found that it would cost about 0.1s if I use mergesort or quicksort:

sort(x0, rev=true, alg=MergeSort) or sort(x0, rev=true, alg=QuickSort)

However, the default algorithm is much faster than quicksort or mergesort. When I check Julia documentation, it seems that " Currently, a hybrid of RadixSort , ScratchQuickSort , InsertionSort , and CountingSort is used based on input type, size, and composition." So the default algorithm is a hybrid of these four algorithms?

No specific reason for choosing this version. I downloaded Julia last year and chose this version.

Probably not a problem either, but for convenient management of multiple Julia versions including keeping an up-to-date installation it’s recommended to use juliaup:

Thanks for your advice! It may be a problem since I want to compare my algorithm with the sorting algorithms in Julia. Everything seems fine if I set the sorting algorithm to be MergeSort, QuickSort, or other sorting algorithms. However, the default algorithm in the sort function is much faster than other algorithms… It makes me confused about that.

Sorry I meant using Julia 1.10.2 is likely not a problem, although if you are interested in the very granular details of sort maybe it is, as the actual implementation of the default sort are an implementation detail.

Have you looked at the extended help as advised in the manual?

help?> ?Base.DEFAULT_STABLE
  β”‚ Warning
  β”‚
  β”‚  The following bindings may be internal; they may change or be removed in future versions:
  β”‚
  β”‚    β€’  Base.DEFAULT_STABLE

  DEFAULT_STABLE

  The default sorting algorithm.

  This algorithm is guaranteed to be stable (i.e. it will not reorder elements that compare equal). It makes an effort to be fast for most inputs.

  The algorithms used by DEFAULT_STABLE are an implementation detail. See extended help for the current dispatch system.

  Extended Help
  ≑≑≑≑≑≑≑≑≑≑≑≑≑

  DEFAULT_STABLE is composed of two parts: the InitialOptimizations and a hybrid of Radix, Insertion, Counting, Quick sorts.

  We begin with MissingOptimization because it has no runtime cost when it is not triggered and can enable other optimizations to be applied later. For example, BoolOptimization cannot apply to an
  AbstractVector{Union{Missing, Bool}}, but after MissingOptimization is applied, that input will be converted into am AbstractVector{Bool}.

  We next apply BoolOptimization because it also has no runtime cost when it is not triggered and when it is triggered, it is an incredibly efficient algorithm (sorting Bools is quite easy).

  Next, we dispatch to InsertionSort for inputs with length <= 10. This dispatch occurs before the IEEEFloatOptimization pass because the IEEEFloatOptimizations are not beneficial for very small inputs.

  To conclude the InitialOptimizations, we apply IEEEFloatOptimization.

  After these optimizations, we branch on whether radix sort and related algorithms can be applied to the input vector and ordering. We conduct this branch by testing if UIntMappable(v, order) !== nothing.
  That is, we see if we know of a reversible mapping from eltype(v) to UInt that preserves the ordering order. We perform this check after the initial optimizations because they can change the input vector's
  type and ordering to make them UIntMappable.

  If the input is not UIntMappable, then we perform a presorted check and dispatch to ScratchQuickSort.

  Otherwise, we dispatch to InsertionSort for inputs with length <= 40 and then perform a presorted check (CheckSorted).

  We check for short inputs before performing the presorted check to avoid the overhead of the check for small inputs. Because the alternate dispatch is to InsertionSort which has efficient O(n) runtime on
  presorted inputs, the check is not necessary for small inputs.

  We check if the input is reverse-sorted for long vectors (more than 500 elements) because the check is essentially free unless the input is almost entirely reverse sorted.

  Note that once the input is determined to be UIntMappable, we know the order forms a total order (wikipedia.org/wiki/Total_order) over the inputs and so it is impossible to perform an unstable sort because
  no two elements can compare equal unless they are equal, in which case switching them is undetectable. We utilize this fact to perform a more aggressive reverse sorted check that will reverse the vector [3,
  2, 2, 1].

  After these potential fast-paths are tried and failed, we ComputeExtrema of the input. This computation has a fairly fast O(n) runtime, but we still try to delay it until it is necessary.

  Next, we ConsiderCountingSort. If the range the input is small compared to its length, we apply CountingSort.

  Next, we ConsiderRadixSort. This is similar to the dispatch to counting sort, but we consider the number of bits in the range, rather than the range itself. Consequently, we apply RadixSort for any
  reasonably long inputs that reach this stage.

  Finally, if the input has length less than 80, we dispatch to InsertionSort and otherwise we dispatch to ScratchQuickSort.
1 Like

For large arrays of Float64 (and many other types), Julia uses a RadixSort.

I see. Thanks for your help! The simulation result coincides with the statement.

Thank you very much! I will carefully check this manual.

If you are interested in sorting things FAST then I cannot recommend @Lilith’s Juliacon talks enough. Starting with this overview at last years JuliaCon

There were a couple of other shorter talks by her as well on sorting.

2 Likes

Thank you for providing the video! It is great for learning the sort function in Julia.