DataFrames and sorting - is the default sort a stable sort?

Here’s a link to the documentation page on DataFrames and sorting.

Some details are provided here, but unfortunately the information as to whether the default sorting algorithm is a stable sort is not included.

Does anyone happen to know if DataFrames uses a stable sort?

Afaiu it uses a stable sort by default, cf here:

Sort.defalg(df::AbstractDataFrame) =
    size(df, 1) < 8192 ? Sort.MergeSort : SortingAlgorithms.TimSort

This could be documented better (there)! And probably the implementation should be changed to just use Julia’s new default sort implementation.

I would have assumed for such works stable sorting is always used, but actually for Pandas I see not, unless I’m misreading its docs (though stable is an option there):

DataFrame.sort_values(*by*, ***, *axis=0*, *ascending=True*, *inplace=False*, 

Note, Julia uses stable sort by default, so maybe all docs of packages should assume that, unless saying otherwise.

@foobar_lv2, I see lines preceding:

# TimSort is fast for data with structure, but only if the DataFrame is large enough
# TODO: 8192 is informed but somewhat arbitrary

If TimSort were fast/est then Julia would use be default? It’s Pandas’ default stable sort, and mergesort there an alias for it.

help?> sort!
sort!(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)

Sort the vector v in place. A stable algorithm is used by default: the ordering of elements that compare equal is preserved. […]

help?> ?Sort.DEFAULT_STABLE
The default sorting algorithm.

This algorithm is guaranteed to be stable (i.e. it will not reorder elements that compare equal). It makes an effort to be fast for most inputs.

The algorithms used by DEFAULT_STABLE are an implementation detail. See extended help for the current dispatch system.

Extended Help
≡≡≡≡≡≡≡≡≡≡≡≡≡

DEFAULT_STABLE is composed of two parts: the InitialOptimizations and a hybrid of Radix, Insertion, Counting, Quick sorts.

[I count probably a hybrid of 6 algorithms currently used, I know a lot of work went into this in recent times]

Finally, if the input has length less than 80, we dispatch to InsertionSort and otherwise we dispatch to ScratchQuickSort.

help?> ?Sort.DEFAULT_UNSTABLE

The algorithms used by DEFAULT_UNSTABLE are an implementation detail. They are currently the same as those used by DEFAULT_STABLE, but this is subject to change in future.

[It used to mean the unstable QuickSort.]

Do not ask for this directly (there’s not need for since now slower I think, or at least no speed-improvement over the stable algorithm):

help?> Sort.QuickSort
  QuickSort

  Indicate that a sorting function should use the quick sort algorithm, **which is not stable.** [,,[

Julia has always sorted stably be default (or am I reading it wrong in old 1.6 docs):

Sort the vector v in place. QuickSort is used by default for numeric arrays while MergeSort is used for other arrays.