# Ifelse vs ternary with tuples

I believe I have the basic idea of when to use `ifelse` instead of the ternary operator. Basically, `ifelse` avoids an expensive branch, but computes the two possibilities β which may be more expensive. (Also `ifelse` may be friendler to autodiff.) But now throw tuples into the mix, and I have questions. Specifically, looking at these lines in the Julia source:

``````max(x::T, y::T) where {T<:Real} = ifelse(y < x, x, y)
min(x::T, y::T) where {T<:Real} = ifelse(y < x, y, x)
minmax(x::T, y::T) where {T<:Real} = y < x ? (y, x) : (x, y)
``````

For `min` and `max`, thereβs nothing to compute between `x` and `y`, so just use `ifelse`. But for `minmax`, would those tuples actually get created? I have this very fuzzy notion that tuples are basically free, and the compiler knows how to deal with them very well. So, is this code saying theyβre not that free?

For example, if `x` and `y` are `BigFloat`s (which do require allocation), would `max` and `min` get away without allocating somehow? Would `minmax` have to allocate four `BigFloat`s to fill the two tuples, and then just return one of the tuples?

Things get a little confused because the compiler itself can β and does β do transformations from `?:` to `ifelse` (edit: and vice-versa!) when it can prove itβs advantageous (and ok) to do so. And itβs even more confused because branches arenβt necessarily bad (if theyβre predictable and donβt mess up SIMD).

Those lines of code are pretty old, too β 10+ years β so Iβm not sure if theyβre necessarily instructive or optimal either.

In fact, Julia v1.11 (but not v1.10) compiles that minmax to the branchless select.

6 Likes

I think the best guidance these days is to just use `?:` branches and let the compiler handle it unless youβre hand-crafting SIMD kernels or wanting the convenience (and, e.g., broadcast-ability) of a function.

4 Likes

The above poster beat me to most of this answer, but I think this still adds something so Iβm posting it:

The compiler gets its hands on any of these and does what it wants (within the range itβs allowed). Itβs common that ternary expressions get compiled to be branchless and also that a nontrivial `ifelse` uses a branch.

There is a semantic difference in that `ifelse` must always execute both arguments and `?:` must only execute the correct side, but this distinction only matters when either has side-effects. For example, `ifelse(true, 1, error())` must error while `true ? 1 : error()` must not.

In general, it tends to be slightly easier for the compiler to make branch-free code using `ifelse` rather than `?:`. This is why it is desirable when one is attempting to write vectorizable code for hot loops. However, `?:` technically allows the compiler more freedom in most situations where `ifelse` is relevant because it is allowed to not evaluate one side entirely (including side effects).

Since forming a tuple from two objects does not have side effects, itβs likely that either a `?:` or `ifelse` implementation of the functions in the original question result in the same compiled code. `code_native` is useful here.

``````julia> code_native((x,y) -> y < x ? (y, x) : (x, y), NTuple{2,Float64}; debuginfo=:none)

julia> code_native((x,y) -> ifelse(y < x, (y, x), (x, y)), NTuple{2,Float64}; debuginfo=:none)
``````

On my x86 running Julia v1.10, I find the assembly of the `ifelse` more attractive in this case (the ternary does emit a branch while the `ifelse` does not). However, changing the comparison to `y <= x` emits branches for both versions and I slightly prefer the ternary.

Keep in mind that a function this small is likely to be inlined, which means that it may interact with other code in significant ways. This means that this is probably not the end of the story. For example, in a loop the compiler may eliminate the branch in order to enable vectorization.

6 Likes

Thanks for the explanations. I guess, as usual, the way to go is to just write code that reads well, and donβt worry about details like this until a profiler says to β then experiment to see what the compiler actually does.

2 Likes

Note that the compiler tends to prefer using branches, under the assumption that well predicted code.

One example where this backfires is in binary searches, which are generally impossible to predict.
I get

``````julia> using Random, BenchmarkTools

julia> x = rand(Int, 2048); s = sort(x);

julia> perm = randperm(length(x));

julia> function findbench(f, x, vals)
@inbounds for i = eachindex(x, vals)
v = vals[i]
f(x[v], x) == v || throw("oops")
end
end
findbench (generic function with 1 method)

julia> @benchmark findbench((x,v)->searchsortedfirst(v, x), \$s, \$perm)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max):  94.526 ΞΌs β¦ 117.188 ΞΌs  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     95.964 ΞΌs               β GC (median):    0.00%
Time  (mean Β± Ο):   96.209 ΞΌs Β±   1.087 ΞΌs  β GC (mean Β± Ο):  0.00% Β± 0.00%

ββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
94.5 ΞΌs         Histogram: frequency by time         99.6 ΞΌs <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

However, if I start Julia via `JULIA_LLVM_ARGS="-x86-cmov-converter=false" julia`, which disables the cmov->branch conversion (so our code is actually branchless), I get

``````julia> @benchmark findbench((x,v)->searchsortedfirst(v, x), \$s, \$perm)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max):  40.656 ΞΌs β¦ 62.645 ΞΌs  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     40.919 ΞΌs              β GC (median):    0.00%
Time  (mean Β± Ο):   41.682 ΞΌs Β±  2.223 ΞΌs  β GC (mean Β± Ο):  0.00% Β± 0.00%

βββ                 βββ               βββ          βββ      β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
40.7 ΞΌs      Histogram: log(frequency) by time      49.9 ΞΌs <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

Would be nice if we had a way to tell the compiler βno really, I do want this particular code to be branchless!β without using a hammer that impacts all code like that `ENV` variable.

7 Likes

I didnβt know we could do this! But yeah it would be really nice if we could tell which parts do branchless/branchful.

I did implement one with `@asmcall` when I was trying to learn more about asm, the performance is comparable. Although, you start to get worse performance if your array does not fit L1/L2 cache.