julia> using Chairmarks
julia> function g(dij)
vmin = Inf
for x in dij
vmin = min(x, vmin)
end
return vmin
end
julia> function f(dij)
vmin = Inf
for x in dij
vmin = ifelse(x < vmin, x, vmin)
end
return vmin
end
julia> a = rand(512);
julia> @be g($a)
Benchmark: 2864 samples with 8 evaluations
min 3.591 μs
median 3.823 μs
mean 3.800 μs
max 7.078 μs
julia> @be f($a)
Benchmark: 2959 samples with 69 evaluations
min 428.188 ns
median 428.478 ns
mean 430.873 ns
max 1.153 μs
I get very different results on aarch64, where we use llvm fmin and fmax:
julia> @be g($a)
Benchmark: 3758 samples with 637 evaluations
min 34.733 ns
median 34.929 ns
mean 36.422 ns
max 73.391 ns
julia> @be f($a)
Benchmark: 4406 samples with 30 evaluations
min 600.000 ns
median 631.950 ns
mean 679.569 ns
max 2.093 μs
What happens if you try to do the same on your machine (which I presume is x86_64)?
Timings seem to be all over the place. This is in aarm64 (M3):
julia> @be f($(x))
Benchmark: 4028 samples with 45 evaluations
min 455.556 ns
median 490.733 ns
mean 510.856 ns
max 822.222 ns
julia> @be g($(x))
Benchmark: 3844 samples with 103 evaluations
min 217.233 ns
median 228.553 ns
mean 234.833 ns
max 424.757 ns
Your g seems to be still faster than f? What’s your input? You seem to be using at least a different name for the input (x instead of a) than what shown in the original post. Also, what version of Julia are you using? The code I showed above was introduced in Julia v1.10 with Use native fmin/fmax in aarch64 by gbaraldi · Pull Request #47814 · JuliaLang/julia · GitHub, if you have an older version of Julia you shouldn’t seem the same speedup.
Notice that @gragusa’s results are also against the advantage of ifelse over min. In that case, f (the version with ifelse) is reported first, and shows longer times.
Which is what I said? But I thought that their point was that their g wasn’t as fast as mine, and I tried to suggest where to look at (for example Julia version).
My (limited) understanding of IEEE 754 is that ifelse(x<y, x, y) would not be a correct implementation of min(x, y). The standard seems to require that a NaN argument is propagated to the result (see §§6.2.3, 9.6). This is what Julia does:
julia> x = NaN; y = 0.0; (x < y ? x : y), min(x, y)
(0.0, NaN)
This makes the code slower unless the hardware supports IEEE min. With @fastmath you get your fast version.
If NaN values can be discarded, then it actually makes sense to return Inf. You get the minimum of an empty collection, which should be the largest value available (= the neutral element for the binary operation min). I sometimes wish Julia would do the same instead of giving an error.
As discussed above, the issue is that min(x, y) and ifelse(x<y, x, y) are not isequal in the presence of NaN or signed zeros. With @fastmath, we do use the latter because @fastmath permits “don’t worry about the result with NaN” and “-0.0 and +0.0 can be used interchangeably.”
The ifelse version is faster on x86 because there is a native (and vectorizable) instruction for that, but there isn’t for the full min (which requires -0.0 < +0.0 and NaN < -Inf, depend on the argument ordering in the ifelse case – although I would contend that another reasonable ordering would be -Inf < NaN < -floatmax()). aarch does have a native instruction with min’s semantics, which is why it offers better performance.
I believe the current min was implemented in #41709 and support for the aarch instructions was added in #47814. There’s an open issue to use the faster version in some known-safe cases #48487, but I see that as very difficult for extremely little gain, I’d rather just wait for x86 to maybe eventually add it as a native instruction (however long that might take, but I vaguely recall it being suggested in the most recent IEEE754 revison so maybe someday).