Hello everyone,
I’ve benchmarked Julia’s implementation of binary search (searchsortedlast
) to test whether Julia can match C performance (spoiler: it can). However, I found that isnan
checks within isless
lead to a 1.5–2x slowdown (depending on array size) when values from a sorted Vector{Float64}
are searched in another sorted Vector{Float64}
. When the array of search queries is not sorted, the slowdown due to isnan
becomes small compared to the slowdown due to irregular memory access pattern, but it’s still there.
In the current stable release (1.8.5), these checks can be turned off by --mathmode=fast
. However, in 1.9.0-beta3 this flag does nothing (#41638), and @fastmath
doesn’t work either, because searchsortedlast
is defined in Base.Sort
, not in my script.
So, my question is: if I write my code with carefully avoiding division by zero and other bad things, and I don’t need accuracy within 1 ulp, how can I apply @fastmath
to functions from other modules? Can we have something like @fastmath import Base: searchsortedlast
? Or a way for package developers to explicitly declare certain functions as “fastmath-safe” (@fastmathsafe export func1, func2
) so that the global --math-mode=fast
flag will apply only to them?
My code (nonan.jl):
if "nonan" ∈ ARGS
Base.isnan(x::T) where T <: AbstractFloat = false
end
using BenchmarkTools
import Random: seed!, shuffle
heading(str) = println("\n\n", str, "\n", "-"^length(str))
const T = Float64
const N = 2^20
Da = UInt32
seed!(collect(Da, "weed!"))
const arr = Vector{T}(sort(rand(1:N^2, N)))
const vals = Vector{T}(sort(rand(1:N^2, N)))
const slav = shuffle(vals)
function predecessor(x, arr)
@fastmath searchsortedlast(arr, x)
end
const Nsamples = 10
function run_benchmarks()
heading("Sorted queries")
benchmark = @benchmark [predecessor(val, $arr) for val in $vals] samples=Nsamples
println(IOContext(stdout, :compact => false), benchmark)
heading("Shuffled queries")
mbecnhkra = @benchmark [predecessor(lav, $arr) for lav in $slav] samples=Nsamples
println(IOContext(stdout, :compact => false), mbecnhkra)
end
run_benchmarks()
With Julia 1.8.5, julia nonan.jl
outputs
Sorted queries
--------------
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 125.497 ms … 154.638 ms ┊ GC (min … max): 0.00% … 0.18%
Time (median): 138.220 ms ┊ GC (median): 0.00%
Time (mean ± σ): 138.085 ms ± 11.044 ms ┊ GC (mean ± σ): 0.02% ± 0.06%
█ █ ▁ ▁ ▁ ▁ ▁ ▁
█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
125 ms Histogram: frequency by time 155 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
Shuffled queries
----------------
BenchmarkTools.Trial: 9 samples with 1 evaluation.
Range (min … max): 553.379 ms … 619.787 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 571.075 ms ┊ GC (median): 0.00%
Time (mean ± σ): 573.963 ms ± 19.642 ms ┊ GC (mean ± σ): 0.01% ± 0.02%
█ █ █ ██ ██ █ █
█▁█▁▁▁▁█▁▁▁▁▁▁▁██▁▁▁▁██▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
553 ms Histogram: frequency by time 620 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
julia --math-mode=fast nonan.jl
outputs
Sorted queries
--------------
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 65.392 ms … 71.410 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 70.893 ms ┊ GC (median): 0.00%
Time (mean ± σ): 68.890 ms ± 2.780 ms ┊ GC (mean ± σ): 0.03% ± 0.10%
▁▁▁ ▁ ██ ▁ ▁
███▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁█▁█ ▁
65.4 ms Histogram: frequency by time 71.4 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
Shuffled queries
----------------
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 490.365 ms … 517.467 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 504.671 ms ┊ GC (median): 0.00%
Time (mean ± σ): 502.479 ms ± 10.312 ms ┊ GC (mean ± σ): 0.01% ± 0.02%
█ ▁ ▁ ▁ ▁ █ ▁ ▁
█▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█ ▁
490 ms Histogram: frequency by time 517 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
julia nonan.jl nonan
(disabling isnan
checks manually) gives similar results: 82 ms (median) for sorted queries and 494 ms for shuffled queries.
For 1.9.0-beta3, the results are similar except that --math-mode=fast
doesn’t work.
Summary of median times:
Sorted | Shuffled | |
---|---|---|
1.8.5 | ||
julia nonan.jl |
138 | 571 |
julia --math-mode=fast nonan.jl |
71 | 505 |
julia nonan.jl nonan |
82 | 494 |
1.9.0-beta3 | ||
julia nonan.jl |
144 | 655 |
julia --math-mode=fast nonan.jl |
144 | 638 |
julia nonan.jl nonan |
71 | 582 |