C implementation of function being ~4 times faster even absence of allocs

foobar_lv2 · March 4, 2025, 10:38am

cf PSA: Microbenchmarks remember branch history for the effect that @vchuravy alluded to.

Floating point numbers are annoying with nans. It’s quite unsurprising that a generic base julia “findmin” function will kinda suck on floating point numbers:

C/C++ are culturally fine with under-specifying their stuff, i.e. ignoring all corner cases and saying “UB if nan” unless some language-lawyer digs up a piece of spec that hints at a different interpretation (with different compilers / archs doing different things, because why not).
julia is not fine with that! There will be some well-defined and considered behavior on nan values. Whatever this behavior, on some CPUs / data it will be slower than just ignoring the possibility.

That’s not because of a big fault of julia, it’s because “give me the minimum of these numbers” is just a bad request, because it begs the follow-up questions “and which do you want there are nans? If there are multiple nans, which one do you want (nan payload!)? If you also want the indices and the maxindex is non-unique, which one to you want? Suppose the max is zero; do you care about the difference between -0.0 and 0.0?”.

To belabor these points:

julia> naive_findmax([1.0, NaN])
ERROR: TypeError: in typeassert, expected Int64, got a value of type Nothing

julia> w=[-0.0, 0.0];
julia> findmax(w)
(0.0, 2)
julia> naive_findmax(w)
(0.0, 1)

Are you really OK with that?

Topic		Replies	Views
Performance of findmax vs. raw loop General Usage	12	848	July 19, 2020
Faster `min`,`max` for `Float64\|32` Internals & Design performance	38	2149	August 17, 2021
Poor performance due to memory allocations? Performance memory-allocation	17	2855	January 15, 2019
Julia Beginner (from Python): Numba outperforms Julia in rewrite. Any tips to improve performance? Performance benchmark , python , tullio , loopvectorization	56	6239	August 18, 2021
Replicate @tturbo performance Performance	23	2430	August 23, 2022

C implementation of function being ~4 times faster even absence of allocs

Related topics