`ifelse(x<y, x, y)` much faster than `min()`

julia> using Chairmarks

julia> function g(dij)
    vmin = Inf
    for x in dij
        vmin = min(x, vmin)
    end
    return vmin
end

julia> function f(dij)
    vmin = Inf
    for x in dij
        vmin = ifelse(x < vmin, x, vmin)
    end
    return vmin
end

julia> a = rand(512);

julia> @be g($a)
Benchmark: 2864 samples with 8 evaluations
 min    3.591 μs
 median 3.823 μs
 mean   3.800 μs
 max    7.078 μs

julia> @be f($a)
Benchmark: 2959 samples with 69 evaluations
 min    428.188 ns
 median 428.478 ns
 mean   430.873 ns
 max    1.153 μs

is this expected?

4 Likes

I get very different results on aarch64, where we use llvm fmin and fmax:

julia> @be g($a)
Benchmark: 3758 samples with 637 evaluations
 min    34.733 ns
 median 34.929 ns
 mean   36.422 ns
 max    73.391 ns

julia> @be f($a)
Benchmark: 4406 samples with 30 evaluations
 min    600.000 ns
 median 631.950 ns
 mean   679.569 ns
 max    2.093 μs

What happens if you try to do the same on your machine (which I presume is x86_64)?

1 Like

Timings seem to be all over the place. This is in aarm64 (M3):

julia> @be f($(x))
Benchmark: 4028 samples with 45 evaluations
 min    455.556 ns
 median 490.733 ns
 mean   510.856 ns
 max    822.222 ns

julia> @be g($(x))
Benchmark: 3844 samples with 103 evaluations
 min    217.233 ns
 median 228.553 ns
 mean   234.833 ns
 max    424.757 ns

Your g seems to be still faster than f? What’s your input? You seem to be using at least a different name for the input (x instead of a) than what shown in the original post. Also, what version of Julia are you using? The code I showed above was introduced in Julia v1.10 with Use native fmin/fmax in aarch64 by gbaraldi · Pull Request #47814 · JuliaLang/julia · GitHub, if you have an older version of Julia you shouldn’t seem the same speedup.

Notice that @gragusa’s results are also against the advantage of ifelse over min. In that case, f (the version with ifelse) is reported first, and shows longer times.

1 Like

ok so looks like on aarch64, min() is faster than ifelse(), while on x86_64 (AMD 7840HS) it’s the other way around

1 Like

But what if you use llvm min/max on x86_64?

Which is what I said? But I thought that their point was that their g wasn’t as fast as mine, and I tried to suggest where to look at (for example Julia version).

julia> function g_llvm(dij)
           vmin = Inf
           for x in dij
               vmin = llvm_min(x, vmin)
           end
           return vmin
       end
g_llvm (generic function with 1 method)

julia> Base.@assume_effects :total @inline llvm_min(x::Float64, y::Float64) = ccall("llvm.minimum.f64", llvmcall, Float64, (Float64, Float64), x, y)
llvm_min (generic function with 1 method)

julia> @be g_llvm($a)
LLVM ERROR: Cannot select: 0xe841c70: f64 = fminimum 0xe842610, 0xe842760, REPL[2]:1 @[ REPL[1]:4 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ]
  0xe842610: f64,ch = load<(load (s64) from %ir.uglygep, !tbaa !51, !alias.scope !54, !noalias !55)> 0xeb3e8c0, 0xe841d50, undef:i64, essentials.jl:917 @[ array.jl:891 @[ REPL[1]:5 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ] ]
    0xe841d50: i64 = add 0xe842140, 0xe841ff0, essentials.jl:917 @[ array.jl:891 @[ REPL[1]:5 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ] ]
      0xe842140: i64,ch = CopyFromReg 0xeb3e8c0, Register:i64 %2, essentials.jl:917 @[ array.jl:891 @[ REPL[1]:5 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ] ]
        0xe842840: i64 = Register %2
      0xe841ff0: i64 = shl 0xe841ea0, Constant:i8<3>, essentials.jl:917 @[ array.jl:891 @[ REPL[1]:5 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ] ]
        0xe841ea0: i64,ch = CopyFromReg 0xeb3e8c0, Register:i64 %4, essentials.jl:917 @[ array.jl:891 @[ REPL[1]:5 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ] ]
          0xe842530: i64 = Register %4
        0xe841960: i8 = Constant<3>
    0xe841f80: i64 = undef
  0xe842760: f64,ch = CopyFromReg 0xeb3e8c0, Register:f64 %5, REPL[2]:1 @[ REPL[1]:4 @[ /home/akako/.julia/packages/Chairmarks/msbgo/src/macro_tools.jl:52 ] ]
    0xe841f10: f64 = Register %5
In function: julia_#3_3347

you can’t

Uh, I thought I had tried it at some point and it worked :thinking:

Ah, I think I tried llvm.minnum.f64, that’s what clang uses: Compiler Explorer. This reminded me I had asked a question about llvm.minnum vs llvm.minimum on slack some time ago.

My (limited) understanding of IEEE 754 is that ifelse(x<y, x, y) would not be a correct implementation of min(x, y). The standard seems to require that a NaN argument is propagated to the result (see §§6.2.3, 9.6). This is what Julia does:

julia> x = NaN; y = 0.0; (x < y ? x : y), min(x, y)
(0.0, NaN)

This makes the code slower unless the hardware supports IEEE min. With @fastmath you get your fast version.

4 Likes

Another problem of the ifelse version is that it’s non-commutative:

julia> ifelse_min(x, y) = ifelse(x < y, x, y)
ifelse_min (generic function with 1 method)

julia> ifelse_min(NaN, 3.14), ifelse_min(3.14, NaN)
(3.14, NaN)

julia> ifelse_min(0.0, -0.0), ifelse_min(-0.0, 0.0)
(-0.0, 0.0)

julia> min(NaN, 3.14), min(3.14, NaN)
(NaN, NaN)

julia> min(0.0, -0.0), min(-0.0, 0.0)
(-0.0, -0.0)

Another fun result, with the function f defined above:

julia> f([NaN])
Inf

If NaN values can be discarded, then it actually makes sense to return Inf. You get the minimum of an empty collection, which should be the largest value available (= the neutral element for the binary operation min). I sometimes wish Julia would do the same instead of giving an error.

You mean like

julia> minimum(Float64[]; init=Inf)
Inf

?

Exactly. Similarly to

julia> sum(Float64[])
0.0

where you can omit the init because the element type is known.

1 Like

As discussed above, the issue is that min(x, y) and ifelse(x<y, x, y) are not isequal in the presence of NaN or signed zeros. With @fastmath, we do use the latter because @fastmath permits “don’t worry about the result with NaN” and “-0.0 and +0.0 can be used interchangeably.”

The ifelse version is faster on x86 because there is a native (and vectorizable) instruction for that, but there isn’t for the full min (which requires -0.0 < +0.0 and NaN < -Inf, depend on the argument ordering in the ifelse case – although I would contend that another reasonable ordering would be -Inf < NaN < -floatmax()). aarch does have a native instruction with min’s semantics, which is why it offers better performance.

I believe the current min was implemented in #41709 and support for the aarch instructions was added in #47814. There’s an open issue to use the faster version in some known-safe cases #48487, but I see that as very difficult for extremely little gain, I’d rather just wait for x86 to maybe eventually add it as a native instruction (however long that might take, but I vaguely recall it being suggested in the most recent IEEE754 revison so maybe someday).