A.
The fastest stable sort (published last month, based on his quadsort that’s “faster than quicksort”) seems to be: GitHub - scandum/blitsort: Blitsort is an in-place stable adaptive rotate mergesort / quicksort.
Blitsort uses a monobound binary search, which is up to two times faster than the binary search in general use.
Trinity rotation
Blitsort uses a trinity rotation, a new and significantly faster array rotation algorithm.
[…]By default blitsort uses 512 elements worth of stack memory. […] Blitsort rotate merges recursively, requiring an additional log(n) memory. It’s possible to make this O(1) through the implementation of a stack.
Performance
Blitsort has exceptional performance due to the quad swap, monobound binary search, and trinity rotation. It is likely the fastest in-place stable sort written so far and is about 15% faster than octosort, which is a block merge sort.
Blitsort’s performance is similar to that of quadsort as long as
His quad sort and all the variants are also very well documented, and also WikiSort:
B.
WikiSort vs. std::stable_sort()
(clang++ version 3.2, sorting 0 to 1.5 million items)
Up do 1280% faster, always faster for the many cases in the two tables there, except for one “95% as fast”