Performance of custom lt function in sort!

Ian_Slagle · November 30, 2021, 12:50am

I am interested in writing my own lt function within a sort for better performance in a specific use case (in this case, I only care if the numbers are a set length apart from one another). However, it seems that any substitution of the lt function besides isless decreases performance.

Take this simple example:

@benchmark sort!(X, lt = isless) setup = (X = randn(1_000_000))

This benchmarks as

BenchmarkTools.Trial: 65 samples with 1 evaluation.
 Range (min … max):  73.737 ms …  75.956 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     74.748 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   74.770 ms ± 469.959 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▄▁▁▁    ▄  ██ ▄ ▄▁▁▁▁▁▄      ▁                  
  ▆▁▆▁▆▆▁▁▁▁▁▁▆▁████▆▁▆▆█▆▆██▁█▆███████▁▆▆▆▆▆█▆▁▆▁▆▁▁▁▁▆▆▁▆▁▁▆ ▁
  73.7 ms         Histogram: frequency by time         75.8 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

However, the simple substitution can be made as:

new_lt(x::AbstractFloat, y::AbstractFloat) = x < y
@benchmark sort!(X, lt = new_lt) setup = (X = randn(1_000_000))

This should do the same thing, I think. If anything, it should bypass checks related to isnan. However, the performance decreases, and the above code benchmarks as follows:

BenchmarkTools.Trial: 56 samples with 1 evaluation.
 Range (min … max):  86.292 ms …  90.038 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     87.803 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   87.871 ms ± 885.583 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁   ▁       ▁▁▁       ▁ ▄   ▁    ▄  ▁            ▁█           
  █▁▁▁█▁▁▆▆▆▁▁███▆▆▆▆▆▆▁█▆█▁▁▆█▆▁▆▁█▆▆█▆▆▆▁▆▁▁▆▆▆▁▁██▁▆▆▆▁▁▁▆▆ ▁
  86.3 ms         Histogram: frequency by time         89.4 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

I’m curious what the difference is here, and if there’s any way around it?

jling · November 30, 2021, 1:03am

what’s more surprising is that < for Float64 is just Base.lt_float yet:

julia> @benchmark sort!(X, lt = Base.lt_float) setup = (X = randn(1_000_000))
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  1.125 s …   1.264 s  ┊ GC (min … max): 1.53% … 9.27%
 Time  (median):     1.131 s              ┊ GC (median):    1.52%
 Time  (mean ± σ):   1.163 s ± 59.183 ms  ┊ GC (mean ± σ):  3.20% ± 3.47%

  ███               █                                     █  
  ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.12 s         Histogram: frequency by time        1.26 s <

 Memory estimate: 1.02 GiB, allocs estimate: 68303322.

this is still possible if your “metric” is special, but it’s hard to beat what’s in Base if you’re also comparing by number size, since it’s optizmied (I assume) already.

StefanKarpinski · November 30, 2021, 2:43am

The default sort order is heavily optimized in multiple ways. I don’t exactly recall but it’s possible that changing from the default sort uses a merge sort instead of a quick sort. The default sort also uses integer operations and handles NaNs in a special, clever way.

Topic		Replies	Views
Float64 comparison operator performance Performance	8	1061	September 26, 2019
Understanding issorted's lt keyword New to Julia	88	2817	May 14, 2021
Anoymous Functions Slow in sort Performance sort	3	856	August 22, 2018
What is `slt_int` Internals & Design	8	1865	October 22, 2021
@fastmath for library functions? (2x slowdown of binary search because of NaN checks in Base) Performance fast-math	14	760	January 30, 2023

Performance of custom lt function in sort!

Related topics