I am interested in writing my own lt
function within a sort for better performance in a specific use case (in this case, I only care if the numbers are a set length apart from one another). However, it seems that any substitution of the lt
function besides isless
decreases performance.
Take this simple example:
@benchmark sort!(X, lt = isless) setup = (X = randn(1_000_000))
This benchmarks as
BenchmarkTools.Trial: 65 samples with 1 evaluation.
Range (min … max): 73.737 ms … 75.956 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 74.748 ms ┊ GC (median): 0.00%
Time (mean ± σ): 74.770 ms ± 469.959 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▁▁▁ ▄ ██ ▄ ▄▁▁▁▁▁▄ ▁
▆▁▆▁▆▆▁▁▁▁▁▁▆▁████▆▁▆▆█▆▆██▁█▆███████▁▆▆▆▆▆█▆▁▆▁▆▁▁▁▁▆▆▁▆▁▁▆ ▁
73.7 ms Histogram: frequency by time 75.8 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
However, the simple substitution can be made as:
new_lt(x::AbstractFloat, y::AbstractFloat) = x < y
@benchmark sort!(X, lt = new_lt) setup = (X = randn(1_000_000))
This should do the same thing, I think. If anything, it should bypass checks related to isnan
. However, the performance decreases, and the above code benchmarks as follows:
BenchmarkTools.Trial: 56 samples with 1 evaluation.
Range (min … max): 86.292 ms … 90.038 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 87.803 ms ┊ GC (median): 0.00%
Time (mean ± σ): 87.871 ms ± 885.583 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ ▁ ▁▁▁ ▁ ▄ ▁ ▄ ▁ ▁█
█▁▁▁█▁▁▆▆▆▁▁███▆▆▆▆▆▆▁█▆█▁▁▆█▆▁▆▁█▆▆█▆▆▆▁▆▁▁▆▆▆▁▁██▁▆▆▆▁▁▁▆▆ ▁
86.3 ms Histogram: frequency by time 89.4 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
I’m curious what the difference is here, and if there’s any way around it?