Sorting a vector of fixed size

I need to solve a combinatorics problem, that requires, among others, finding and sortperm (a.k.a. argsort) for thousands of millions of vectors of fixed size.

It so happens that the size of the vector is known at the compile time in my setup.
Since the size of the vectors is expected to be small (single digit), I expect a significant speedup when compiler is given the size of the vectors, because it theoretically can unroll the sort into a hierarchy of β€œif” statements.

Is this kind of optimization achievable in Julia? If so, how?

Edit: in C++ one can approach the problem like this: c++ - Very fast sorting of fixed length arrays using comparator networks - Stack Overflow

One approach is to use SVectors from StaticArrays.jl:

1.7.0-rc1> using BenchmarkTools, StaticArrays

1.7.0-rc1> @benchmark sort(v) setup=(v=rand(8))  # ordinary vectors
BenchmarkTools.Trial: 10000 samples with 988 evaluations.
 Range (min … max):  47.773 ns … 675.911 ns  β”Š GC (min … max): 0.00% … 90.25%
 Time  (median):     55.466 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   59.278 ns Β±  22.770 ns  β”Š GC (mean Β± Οƒ):  1.54% Β±  4.36%

   β–ƒβ–…β–†β–‡β–ˆβ–ˆβ–‡β–†β–ƒβ–                                                  β–‚
  β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–†β–‡β–†β–…β–†β–„β–†β–…β–…β–‚β–ƒβ–…β–ƒβ–…β–†β–†β–ˆβ–‡β–‡β–‡β–†β–…β–†β–†β–†β–…β–†β–…β–†β–†β–‡β–‡β–‡β–ˆβ–ˆβ–‡β–‡β–…β–†β–†β–…β–†β–†β–…β–…β–… β–ˆ
  47.8 ns       Histogram: log(frequency) by time       125 ns <

 Memory estimate: 128 bytes, allocs estimate: 1.

1.7.0-rc1> @benchmark sort(v) setup=(v=@SVector rand(8))  # static vectors
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  21.586 ns … 256.124 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     23.293 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   24.495 ns Β±   6.133 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–‡β–ƒβ–‡β–ˆβ–‚β–β–‚β–β–β–‚β–‚β–β–β–„β–β–‚                                             β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–…β–…β–…β–…β–…β–…β–…β–…β–†β–†β–…β–„β–…β–†β–†β–†β–…β–‡β–†β–†β–†β–†β–‡β–†β–†β–†β–…β–†β–…β–†β–…β–†β–†β–…β–…β–„β–ƒβ–ƒβ–…β–„β–„β–β–… β–ˆ
  21.6 ns       Histogram: log(frequency) by time      55.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

It’s pretty fast, but the speedup is less than what you will see from other operations, probably.

SVectors are statically sized as well as immutable, if you absolutely have to mutate the vectors, there are also MVector, but in most cases immutability is not a problem.

2 Likes

(Edit: The below benchmarks are not quite reliable, since it probably measures sorting vectors that are already sorted during the benchmark. I’m not sure how to benchmark in-place sorting of short vectors, since using evals=1 does not work quite well either. Benchmarking of sort (not sort!) is probably more reliable.)

Hmm, I tried MVector as well as sorting in-place:

1.7.0-rc1> @benchmark sort!(v) setup=(v=@MVector rand(8))
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  18.273 ns … 89.157 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     18.976 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   19.758 ns Β±  3.703 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ƒβ–ˆβ–‡β–†β–ƒβ–ƒβ–β–‚      ▁     β–‚ β–‚                                     β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–„β–ˆβ–‡β–ˆβ–‡β–ˆβ–ˆβ–ˆβ–‡β–„β–…β–„β–„β–ƒβ–„β–β–β–β–ƒβ–„β–„β–ƒβ–β–„β–β–β–ƒβ–β–β–„β–„β–†β–…β–…β–…β–„β–„β–…β–…β–…β–„β–„β–…β–„ β–ˆ
  18.3 ns      Histogram: log(frequency) by time        37 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

1.7.0-rc1> @benchmark sort!(v) setup=(v=rand(8))
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  19.679 ns … 130.924 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     21.084 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   22.022 ns Β±   5.150 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–„β–‡β–ˆβ–†β–„β–‚β–β–ƒβ–β–   β–‚β–ƒ                                              β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–‡β–…β–„β–…β–…β–…β–„β–„β–β–ƒβ–β–β–β–β–…β–‡β–‡β–‡β–…β–…β–…β–…β–…β–…β–…β–…β–†β–†β–‡β–…β–†β–†β–†β–†β–…β–„β–„β–…β–„β–„β–ƒβ–ƒβ–…β–„β–„ β–ˆ
  19.7 ns       Histogram: log(frequency) by time      49.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Sorting in-place with sort! is actually faster than sorting SVectors, which surprises me a bit. Keep in mind, though, that SVectors have many other performance benefits that you should consider.

1 Like

You could try to use https://github.com/JeffreySarnoff/SortingNetworks.jl or https://github.com/nlw0/ChipSort.jl

These are great libraries, but none provides a sortperm/argsort, just sort.

There is also https://github.com/xiaodaigh/SortingLab.jl, although they don’t seem to use sorting networks…