Move Random into Base?

It would be nice to have quicksort select its pivot randomly rather than using median of three to prevent malicious data from causing unexpected quadratic runtime. Benchmarking reveals that this causes negligible performance changes for random input.

The only obstacle, I believe, is that quicksort is in base and used during bootstrapping and random is a stdlib, so sorting cannot depend on random.

Has moving Random to base come up before?
Can someone point me to a discussion on criteria and considerations for moving code into and out of base vs the standard libraries?
Does anyone have comments on this particular case?

To be clear, I don’t actually expect to move Random to Base, but I still think its worth brining up.

3 Likes

There’s already some duplication between Random and Base/Core going on, in particular in regards to Xoshiro due to it also being used in task management for task local RNG state. That duplicate code lives in C, in particular in task.c.

That said, just rand() is available without using Random, so I’m a little confused why using that would begin depending on Random? Do you have an example in mind that breaks this?

Thanks! The fact that Xoshiro is already in Base/Core is promising for my use case.

rand is declared in Base, but defined in Random.

I do have an example in mind. If I replace selectpivot with rand(lo:hi) here in this PR, whether or not I add rand to the using list at the top of the file I get errors during bootstrap. Without using rand I get

    JULIA usr/lib/julia/corecompiler.ji
error during bootstrap:
LoadError(at "compiler/compiler.jl" line 3: LoadError(at "compiler/bootstrap.jl" line 10: UndefVarError(var=:rand)))
ijl_undefined_var_error at /Users/x/.julia/dev/julia_dm/src/rtutils.c:132
ijl_get_binding_or_error at /Users/x/.julia/dev/julia/src/module.c:407
unknown function (ip: 0x11429662d)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x1140a07be)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11409dea1)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11409d9b1)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11409d6a0)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11409cd39)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x114095a1b)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11408f35e)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11408ab36)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x11408a971)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0834ca)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f07fdf8)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0c3c12)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0c099c)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0b1b8f)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0a437a)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f09fbcd)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f09e6c2)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f093049)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f08d266)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f08516a)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f081485)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f07fdf8)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0c3c12)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0c099c)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0b1b8f)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f0a437a)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f09fbcd)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f09e6c2)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f093049)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f08be73)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f08516a)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f081485)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f07fdf8)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f05c1fd)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f051b4a)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10f04ba17)
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:903
jl_parse_eval_all at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1046
ijl_load_ at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1093
unknown function (ip: 0x10effbeb9)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10effbd80)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
jl_apply at /Users/x/.julia/dev/julia_dm/src/./julia.h:1840 [inlined]
do_call at /Users/x/.julia/dev/julia_dm/src/interpreter.c:126
eval_body at /Users/x/.julia/dev/julia_dm/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/x/.julia/dev/julia_dm/src/interpreter.c:750
top-level scope at compiler/compiler.jl:157
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:912
jl_eval_module_expr at /Users/x/.julia/dev/julia_dm/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:715
ijl_toplevel_eval at /Users/x/.julia/dev/julia_dm/src/toplevel.c:921 [inlined]
ijl_toplevel_eval_in at /Users/x/.julia/dev/julia_dm/src/toplevel.c:971
unknown function (ip: 0x10effadf9)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
jl_apply at /Users/x/.julia/dev/julia_dm/src/./julia.h:1840 [inlined]
do_call at /Users/x/.julia/dev/julia_dm/src/interpreter.c:126
eval_body at /Users/x/.julia/dev/julia_dm/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/x/.julia/dev/julia_dm/src/interpreter.c:750
top-level scope at compiler/compiler.jl:3
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:912
jl_parse_eval_all at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1046
ijl_load_ at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1093
ijl_load at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1106
exec_program at /Users/x/.julia/dev/julia_dm/src/jlapi.c:525
true_main at /Users/x/.julia/dev/julia_dm/src/jlapi.c:578
jl_repl_entrypoint at /Users/x/.julia/dev/julia_dm/src/jlapi.c:710

make[1]: *** [/Users/x/.julia/dev/julia_dm/usr/lib/julia/corecompiler.ji] Error 1
make: *** [julia-sysimg-ji] Error 2

with it I get

    JULIA usr/lib/julia/corecompiler.ji
error during bootstrap:
LoadError(at "compiler/compiler.jl" line 3: LoadError(at "sort.jl" line 8: UndefVarError(var=:rand)))
ijl_undefined_var_error at /Users/x/.julia/dev/julia_dm/src/rtutils.c:132
jl_eval_global_var at /Users/x/.julia/dev/julia_dm/src/interpreter.c:150
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:734
jl_eval_module_expr at /Users/x/.julia/dev/julia_dm/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:715
jl_parse_eval_all at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1046
ijl_load_ at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1093
unknown function (ip: 0x10b337eb9)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
unknown function (ip: 0x10b337d80)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
jl_apply at /Users/x/.julia/dev/julia_dm/src/./julia.h:1840 [inlined]
do_call at /Users/x/.julia/dev/julia_dm/src/interpreter.c:126
eval_body at /Users/x/.julia/dev/julia_dm/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/x/.julia/dev/julia_dm/src/interpreter.c:750
top-level scope at compiler/compiler.jl:114
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:912
jl_eval_module_expr at /Users/x/.julia/dev/julia_dm/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:715
ijl_toplevel_eval at /Users/x/.julia/dev/julia_dm/src/toplevel.c:921 [inlined]
ijl_toplevel_eval_in at /Users/x/.julia/dev/julia_dm/src/toplevel.c:971
unknown function (ip: 0x10b336df9)
_jl_invoke at /Users/x/.julia/dev/julia/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/x/.julia/dev/julia/src/gf.c:2574
jl_apply at /Users/x/.julia/dev/julia_dm/src/./julia.h:1840 [inlined]
do_call at /Users/x/.julia/dev/julia_dm/src/interpreter.c:126
eval_body at /Users/x/.julia/dev/julia_dm/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/x/.julia/dev/julia_dm/src/interpreter.c:750
top-level scope at compiler/compiler.jl:3
jl_toplevel_eval_flex at /Users/x/.julia/dev/julia_dm/src/toplevel.c:912
jl_parse_eval_all at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1046
ijl_load_ at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1093
ijl_load at /Users/x/.julia/dev/julia_dm/src/toplevel.c:1106
exec_program at /Users/x/.julia/dev/julia_dm/src/jlapi.c:525
true_main at /Users/x/.julia/dev/julia_dm/src/jlapi.c:578
jl_repl_entrypoint at /Users/x/.julia/dev/julia_dm/src/jlapi.c:710

make[1]: *** [/Users/x/.julia/dev/julia_dm/usr/lib/julia/corecompiler.ji] Error 1
make: *** [julia-sysimg-ji] Error 2

Right, it’s just reexported :thinking:

A dirty workaround would be to ccall jl_genrandom with a manually acquired task-local RNG state, or have a different quicksort implementation during bootstrapping, to retain the performance advantage for user code (maybe @eval the better one after bootstrapping…? Feels dirty as well…).

Or, even sketchier, only use rand for partitions larger than 100. As long as bootstrap doesn’t sort anything big, it will be pass, and then malicious input is capped at O(n log(n)) with a bad constant factor, rather than a full O(n^2).

Someone down the line will introduce something to bootstrap that sorts a medium-sized array and get a hard to trace error.

It’s not actually a performance advantage in typical cases, just an advantage against malicious input.

2 Likes

If it’s only against malicious input, could Base use a different sorting algorithm during bootstrapping? We’re not likely to encounter malicious input there, I don’t think. Not sure how feasible that is, since future uses of sort during bootstrapping would have to take care not to use quicksort too early.

Base could exclusively use insertion sort for bootstrapping. It’s O(n^2) with a great constant factor, stable, in place, and a tiny code size. Seems like a natural choice for bootstrapping, so long as we never have to sort a massive list. For reference, it takes 10ms to sort 100k elements and 2μs to sort 100 elements. This would let us use all sorts of fancy tools for advanced sorting algorithms instead of using workarounds like this

2 Likes

It would be nice to have quicksort select its pivot randomly rather than using median of three to prevent malicious data from causing unexpected quadratic runtime.

Seems like a worthy goal, but the randomness generated by Random (xoshiro256++) isn’t the secure randomness you need for this purpose. Consider using a CSPRNG or Randen (which was specifically designed for security against such adversarial input attacks):

On the other hand, if you’re talking about Quicksort specifically, the proper solution would presumably be to use a Quicksort variant designed specifically for resistance against adversarial input. Golang used to implement one such algorithm (based on Quicksort and Heapsort, depending on recursion depth) for its sort.Sort, and nowadays it seems they use yet another Quicksort variant, pdqsort:

Thanks! Indeed using rand() would not provide cryptographic guarantees against a sufficiently knowledgable advisory, and looking back at my OP, this was the only motivation I mentioned. However, while cryptographic security against malicious input causing quadratic runtime is a worthy goal, it is not my primary goal in this case.

It would still be much harder to utilize quicksort’s quadratic runtime to conduct a DOS attack if it were randomized. For the quadratic runtime attach to be meaningfully harmful to anyone but the perpetrator, it would need to be conducted against a server not operated by the perpetrator which serves users other than the perpetrator, which would make tracking the RNG state difficult.

Another benefit to using random pivots instead of median of three is it is easier to prove that for all reasonable data the algorithm will be n log n. Consider, for example, if all the input data is the same, a very plausible non-malicious test case. It is essential to pick the middle element as the pivot to avoid quadratic runtime. Random pivot selection avoids these edge cases.

Finally, it’s shorter and simpler to call rand(lo:hi) than to write a select_pivot method.

All things considered, while rand doesn’t reach the level of cryptographic security against quadratic runtime, it is shorter, simpler, more robust to accidental quadratic runtime, more robust to bugs, more robust to malicious input from mediocre adversaries, and of empirically comparable performance in comparison to median of three.

2 Likes

A better phrase than “Malicious input” might be “Worst-case input.” I don’t think anyone particularly needs strong security here, but worst-case scenarios can be surprisingly common. Lilith already mentioned the case that defeats the median of 3 pivot, but other pivots will perform poorly in other cases. For instance, choosing the first element will result in worst-case runtime if the array is already sorted.

1 Like

I’m pretty sure that in pretty much all cases, you don’t need cryptographic randomness to prevent DOS against an attacker with a reasonable threat model. Figuring out RNG state on a server (which is the only place this would come up) is pretty difficult since you only can go off of a fairly noisy timing, and the RNG isn’t awful.

1 Like

(That being said, is there a reason the median-of-three strategy breaks down if you always choose the middle element in case of a tie?)

Yes. I’m pretty sure (but don’t technically have a proof) that any deterministic quicksort like algorithm must look at O(n) elements when choosing a pivot to guarantee O(n*log(n)) in the worst case.

I think the second half of my comment may have gone unnoticed, so I’ll reiterate:

Instead of relying on randomization, Julia could just do what other languages do and use a better algorithm. Namely, it’s possible to use quicksort most of the time, but fall back to heapsort in the worst case, completely avoiding quadratic complexity. A variant that’s currently popular is pdqsort.

TBH: I’m not sure how is sort! currently implemented in Julia, maybe it already does the right thing?

I find it kind of funny when folks say “we should use X sorting algorithm because Y language uses it.” I’m not an expert at benchmarking Go stdlibs, but according to my benchmarks, Julia-1.7.2 is 2.5-30x faster than Go and Julia nightly is 7-70x faster. I’m not inclined to copy their sorting system.

x@X go % julia test.jl
  264.182 ns (0 allocations: 0 bytes)
  5.140 ms (0 allocations: 0 bytes)
  394.005 ns (0 allocations: 0 bytes)
  5.706 ms (0 allocations: 0 bytes)
x@X go % julia-1.9 test.jl
  108.883 ns (0 allocations: 1 bytes)
  1.699 ms (4 allocations: 785.52 KiB)
  258.910 ns (0 allocations: 0 bytes)
  1.996 ms (4 allocations: 789.52 KiB)
x@X go % go test -bench=. 

goos: darwin
goarch: amd64
pkg: mmain
cpu: Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
BenchmarkTest_Perm_1e2-4    	  153151	      7913 ns/op
BenchmarkTest_Perm_1e5-4    	      86	  13346754 ns/op  (13 ms)
BenchmarkTest_Float_1e2-4   	  137232	      8785 ns/op
BenchmarkTest_Float_1e5-4   	      76	  15327447 ns/op  (15 ms)
PASS
ok  	mmain	22.278s

Julia source code

using BenchmarkTools, Random

@btime sort!(x) (setup=(x=randperm(10^2)))
@btime sort!(x) (setup=(x=randperm(10^5)))
@btime sort!(x) (setup=(x=rand(10^2)))
@btime sort!(x) (setup=(x=rand(10^5)))

Go source code

package mmain
import "math/rand"
import "sort"
import "testing"

func Perm(n int, b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        slice := rand.Perm(n)
        b.StartTimer()
        sort.Ints(slice)
    }
}

func randFloats(n int) []float64 {
    res := make([]float64, n)
    for i := range res {
        res[i] = rand.Float64()
    }
    return res
}

func Float(n int, b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        slice := randFloats(n)
        b.StartTimer()
        sort.Float64s(slice)
    }
}

func BenchmarkTest_Perm_1e2(b *testing.B) {
    Perm(100, b)
}
func BenchmarkTest_Perm_1e5(b *testing.B) {
    Perm(100000, b)
}
func BenchmarkTest_Float_1e2(b *testing.B) {
    Float(100, b)
}
func BenchmarkTest_Float_1e5(b *testing.B) {
    Float(100000, b)
}

HOWEVER, having said that, I see a clear advantage to adding worst-case detection and avoidance to Julia’s default sorting. The issue is that it comes with a complexity cost. Not runtime complexity, but source code complexity. We currently don’t have a great backup (merge sort would be okay, but I actually hope to remove it from base entirely at some point) so we’d need to add heapsort or some such to base. Source code complexity comes at a serious cost. It is essential to be and stay readable and understandable—and concision is a part of that—to be inviting to new contributors, to take advantage of new architectures when they come out, and to quickly and happily fix bugs when they inevitably emerge.

If we’re going to be adding a substantive amount of complexity to sorting, I’d rather use black magic to get another 2x speedup than eliminate a theoretically possible but practically unreachable worst case.

3 Likes

What would stdlib use for stable sorting in that scenario?

What you wanted to measure (I think) was the speed difference between different algorithms. Your benchmarks, however can’t hope to produce a relevant measurement, because you’re also comparing Go codegen against Julia/LLVM codegen, and the particular interface design of Go’s sort package likewise has special constraints that limit its performance and aren’t relevant for Julia’s stdlib.

1 Like

Sorry, my comment above is misleading. The all-same case is an example of a possible unexpected edge case, but it is actually handled gracefully by the current quicksort.

It depends on the partitioning algorithm. With the specific partitioning method that Julia uses, selecting the middle element as pivot is great in the all-same case. It ends up reversing the whole vector and recursing on perfectly balanced halves. It is possible, however, to write an unstable pivoting algorithm that would place the pivot at the beginning or at the end in that case, because if every element is the same and stability is not guaranteed, it is acceptable to put the pivot anywhere. This is a performance bug but not a correctness bug. To actually defeat the median of three algorithm currently in use takes quite a messy and particular input. That’s why the original authors were okay with this theoretical worst-case and why nobody has complained about it in years.

Fun fact: as of Julia-1.9, the all-same case is handled with exactly n-1 comparisons and performs no writes.

2 Likes

Quicksort

1 Like

That’s not what I said. I claim two things:

  1. Julia could use a better algorithm, with optimal worst case performance and good average performance.
  2. pdqsort in particular is currently popular. Note that Go is not the only language using pdqsort, Rust, e.g., also uses pdqsort for unstable sorting.
1 Like