Hello everyone,

I’ve benchmarked Julia’s implementation of binary search (`searchsortedlast`

) to test whether Julia can match C performance (spoiler: it can). However, I found that `isnan`

checks within `isless`

lead to a 1.5–2x slowdown (depending on array size) when values from a sorted `Vector{Float64}`

are searched in another sorted `Vector{Float64}`

. When the array of search queries is not sorted, the slowdown due to `isnan`

becomes small compared to the slowdown due to irregular memory access pattern, but it’s still there.

In the current stable release (1.8.5), these checks can be turned off by `--mathmode=fast`

. However, in 1.9.0-beta3 this flag does nothing (#41638), and `@fastmath`

doesn’t work either, because `searchsortedlast`

is defined in `Base.Sort`

, not in my script.

So, my question is: if I write my code with carefully avoiding division by zero and other bad things, and I don’t need accuracy within 1 ulp, how can I apply `@fastmath`

to functions from other modules? Can we have something like `@fastmath import Base: searchsortedlast`

? Or a way for package developers to explicitly declare certain functions as “fastmath-safe” (`@fastmathsafe export func1, func2`

) so that the global `--math-mode=fast`

flag will apply only to them?

My code (nonan.jl):

```
if "nonan" ∈ ARGS
Base.isnan(x::T) where T <: AbstractFloat = false
end
using BenchmarkTools
import Random: seed!, shuffle
heading(str) = println("\n\n", str, "\n", "-"^length(str))
const T = Float64
const N = 2^20
Da = UInt32
seed!(collect(Da, "weed!"))
const arr = Vector{T}(sort(rand(1:N^2, N)))
const vals = Vector{T}(sort(rand(1:N^2, N)))
const slav = shuffle(vals)
function predecessor(x, arr)
@fastmath searchsortedlast(arr, x)
end
const Nsamples = 10
function run_benchmarks()
heading("Sorted queries")
benchmark = @benchmark [predecessor(val, $arr) for val in $vals] samples=Nsamples
println(IOContext(stdout, :compact => false), benchmark)
heading("Shuffled queries")
mbecnhkra = @benchmark [predecessor(lav, $arr) for lav in $slav] samples=Nsamples
println(IOContext(stdout, :compact => false), mbecnhkra)
end
run_benchmarks()
```

With Julia 1.8.5, `julia nonan.jl`

outputs

```
Sorted queries
--------------
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 125.497 ms … 154.638 ms ┊ GC (min … max): 0.00% … 0.18%
Time (median): 138.220 ms ┊ GC (median): 0.00%
Time (mean ± σ): 138.085 ms ± 11.044 ms ┊ GC (mean ± σ): 0.02% ± 0.06%
█ █ ▁ ▁ ▁ ▁ ▁ ▁
█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
125 ms Histogram: frequency by time 155 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
Shuffled queries
----------------
BenchmarkTools.Trial: 9 samples with 1 evaluation.
Range (min … max): 553.379 ms … 619.787 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 571.075 ms ┊ GC (median): 0.00%
Time (mean ± σ): 573.963 ms ± 19.642 ms ┊ GC (mean ± σ): 0.01% ± 0.02%
█ █ █ ██ ██ █ █
█▁█▁▁▁▁█▁▁▁▁▁▁▁██▁▁▁▁██▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
553 ms Histogram: frequency by time 620 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
```

`julia --math-mode=fast nonan.jl`

outputs

```
Sorted queries
--------------
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 65.392 ms … 71.410 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 70.893 ms ┊ GC (median): 0.00%
Time (mean ± σ): 68.890 ms ± 2.780 ms ┊ GC (mean ± σ): 0.03% ± 0.10%
▁▁▁ ▁ ██ ▁ ▁
███▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁█▁█ ▁
65.4 ms Histogram: frequency by time 71.4 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
Shuffled queries
----------------
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 490.365 ms … 517.467 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 504.671 ms ┊ GC (median): 0.00%
Time (mean ± σ): 502.479 ms ± 10.312 ms ┊ GC (mean ± σ): 0.01% ± 0.02%
█ ▁ ▁ ▁ ▁ █ ▁ ▁
█▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█ ▁
490 ms Histogram: frequency by time 517 ms <
Memory estimate: 8.00 MiB, allocs estimate: 2.
```

`julia nonan.jl nonan`

(disabling `isnan`

checks manually) gives similar results: 82 ms (median) for sorted queries and 494 ms for shuffled queries.

For 1.9.0-beta3, the results are similar except that `--math-mode=fast`

doesn’t work.

Summary of median times:

Sorted | Shuffled | |
---|---|---|

1.8.5 | ||

`julia nonan.jl` |
138 | 571 |

`julia --math-mode=fast nonan.jl` |
71 | 505 |

`julia nonan.jl nonan` |
82 | 494 |

1.9.0-beta3 | ||

`julia nonan.jl` |
144 | 655 |

`julia --math-mode=fast nonan.jl` |
144 | 638 |

`julia nonan.jl nonan` |
71 | 582 |