Ifelse vs ternary with tuples

moble · February 19, 2024, 9:42pm

I believe I have the basic idea of when to use ifelse instead of the ternary operator. Basically, ifelse avoids an expensive branch, but computes the two possibilities — which may be more expensive. (Also ifelse may be friendler to autodiff.) But now throw tuples into the mix, and I have questions. Specifically, looking at these lines in the Julia source:

max(x::T, y::T) where {T<:Real} = ifelse(y < x, x, y)
min(x::T, y::T) where {T<:Real} = ifelse(y < x, y, x)
minmax(x::T, y::T) where {T<:Real} = y < x ? (y, x) : (x, y)

For min and max, there’s nothing to compute between x and y, so just use ifelse. But for minmax, would those tuples actually get created? I have this very fuzzy notion that tuples are basically free, and the compiler knows how to deal with them very well. So, is this code saying they’re not that free?

For example, if x and y are BigFloats (which do require allocation), would max and min get away without allocating somehow? Would minmax have to allocate four BigFloats to fill the two tuples, and then just return one of the tuples?

mbauman · February 19, 2024, 9:53pm

Things get a little confused because the compiler itself can — and does — do transformations from ?: to ifelse (edit: and vice-versa!) when it can prove it’s advantageous (and ok) to do so. And it’s even more confused because branches aren’t necessarily bad (if they’re predictable and don’t mess up SIMD).

Those lines of code are pretty old, too — 10+ years — so I’m not sure if they’re necessarily instructive or optimal either.

In fact, Julia v1.11 (but not v1.10) compiles that minmax to the branchless select.

mbauman · February 19, 2024, 10:08pm

I think the best guidance these days is to just use ?: branches and let the compiler handle it unless you’re hand-crafting SIMD kernels or wanting the convenience (and, e.g., broadcast-ability) of a function.

mikmoore · February 19, 2024, 10:14pm

The above poster beat me to most of this answer, but I think this still adds something so I’m posting it:

The compiler gets its hands on any of these and does what it wants (within the range it’s allowed). It’s common that ternary expressions get compiled to be branchless and also that a nontrivial ifelse uses a branch.

There is a semantic difference in that ifelse must always execute both arguments and ?: must only execute the correct side, but this distinction only matters when either has side-effects. For example, ifelse(true, 1, error()) must error while true ? 1 : error() must not.

In general, it tends to be slightly easier for the compiler to make branch-free code using ifelse rather than ?:. This is why it is desirable when one is attempting to write vectorizable code for hot loops. However, ?: technically allows the compiler more freedom in most situations where ifelse is relevant because it is allowed to not evaluate one side entirely (including side effects).

Since forming a tuple from two objects does not have side effects, it’s likely that either a ?: or ifelse implementation of the functions in the original question result in the same compiled code. code_native is useful here.

julia> code_native((x,y) -> y < x ? (y, x) : (x, y), NTuple{2,Float64}; debuginfo=:none)

julia> code_native((x,y) -> ifelse(y < x, (y, x), (x, y)), NTuple{2,Float64}; debuginfo=:none)

On my x86 running Julia v1.10, I find the assembly of the ifelse more attractive in this case (the ternary does emit a branch while the ifelse does not). However, changing the comparison to y <= x emits branches for both versions and I slightly prefer the ternary.

Keep in mind that a function this small is likely to be inlined, which means that it may interact with other code in significant ways. This means that this is probably not the end of the story. For example, in a loop the compiler may eliminate the branch in order to enable vectorization.

moble · February 20, 2024, 1:17am

Thanks for the explanations. I guess, as usual, the way to go is to just write code that reads well, and don’t worry about details like this until a profiler says to — then experiment to see what the compiler actually does.

Elrod · February 21, 2024, 4:36am

Note that the compiler tends to prefer using branches, under the assumption that well predicted code.

One example where this backfires is in binary searches, which are generally impossible to predict.
I get

julia> using Random, BenchmarkTools

julia> x = rand(Int, 2048); s = sort(x);

julia> perm = randperm(length(x));

julia> function findbench(f, x, vals)
           @inbounds for i = eachindex(x, vals)
               v = vals[i]
               f(x[v], x) == v || throw("oops")
           end
       end
findbench (generic function with 1 method)

julia> @benchmark findbench((x,v)->searchsortedfirst(v, x), $s, $perm)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  94.526 μs … 117.188 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     95.964 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   96.209 μs ±   1.087 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▁▂▄▄▆▇█▇▆▆▄▂                                        
  ▂▂▂▂▂▃▄▄▆▇█████████████▆▅▄▄▃▃▂▂▂▂▂▂▂▁▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▂▂ ▄
  94.5 μs         Histogram: frequency by time         99.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

However, if I start Julia via JULIA_LLVM_ARGS="-x86-cmov-converter=false" julia, which disables the cmov->branch conversion (so our code is actually branchless), I get

julia> @benchmark findbench((x,v)->searchsortedfirst(v, x), $s, $perm)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  40.656 μs … 62.645 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     40.919 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   41.682 μs ±  2.223 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▆                 ▁▁▁               ▂▂▁          ▁▂▁      ▁
  ███▇▃▁▁▃▁▁▃▁▁▁▁▃▄▆▇██████▆▅▄▃▁▄▄▄▄▁▄▃▆███▆▁▁▁▃▁▁▁▁▅████▆▅▅▆ █
  40.7 μs      Histogram: log(frequency) by time      49.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Would be nice if we had a way to tell the compiler “no really, I do want this particular code to be branchless!” without using a hammer that impacts all code like that ENV variable.

AMJ · February 21, 2024, 9:03am

I didn’t know we could do this! But yeah it would be really nice if we could tell which parts do branchless/branchful.

I did implement one with @asmcall when I was trying to learn more about asm, the performance is comparable. Although, you start to get worse performance if your array does not fit L1/L2 cache.

Topic		Replies	Views
Why isn't minmax branchless? Internals & Design	22	4575	July 20, 2021
`ifelse(x<y, x, y)` much faster than `min()` Performance question	16	400	October 28, 2024
Why is ternary operator two times faster than if statement? New to Julia	4	4468	November 30, 2016
Multiplying by booleans faster than if/else Performance	23	3833	July 8, 2021
Julia Pragma Macros Internals & Design	4	419	February 21, 2024

Ifelse vs ternary with tuples

Related topics