Multiple Outputs (Tuple) from IfElse with VectorizationBase

Hi there,

I am trying to experiment a bit with VectorizationBase to create functions that can be used inside the `@turbo` macro from LoopVectorization.

I have a use case where my computation can be significantly sped up in case one of my inputs is 0 (not only sped up but the full routine would lead to NaN values for points where the input is 0) so I was thinking of generating two subfunctions that support `Vec` inputs from VectorizationBase and call them using `IfElse.ifelse`.

Since I am dealing with Complex outputs and LoopVectorization does not currently support complex inputs, I would need to output 2 values out of the subfunctions (or two sets of `Vec`, one for the real and one for the complex part of the result).

Unfortunately it seems that `IfElse.ifelse` fails as soon as I want to produce more than one output from my subfunctions.

Here is a MWE of the problem I am facing.

``````using IfElse, VectorizationBase

vx = Vec(ntuple(_ -> rand(0:3), VectorizationBase.pick_vector_width(Int64))...)

_simple(x) = (x,x)

_full(x) = (10x,5x)

IfElse.ifelse(vx == 0,_simple(vx)[1],_full(vx)[1])
``````

The above example work, but if one removes the indexing inside the `ifelse` as follows:
`IfElse.ifelse(vx == 0,_simple(vx),_full(vx))`

I get the following error:

I could use two `ifelse` statements to gather both inputs separately but I guess that would re-perfom all the computations inside the subfunctions twice.

Is getting two (or more) outputs out of the `ifelse` possible or is it not supported yet?

Tagging @Elrod as he might be able to shed light into this

And you wouldnâ€™t have to recompute:

``````s = _simple(vx)
f = _full(vx)
x = IfElse.ifelse(vx == 0, s[1], f[1])
y = IfElse.ifelse(vx == 0, s[2], f[2])
``````

``````VectorizationBase.data(IfElse.ifelse(vx == 0, VecUnroll(_simple(vx)), VecUnroll(_full(vx))))
``````
1 Like

Thanks a lot,

Looking at your first proposed solution, this implies that whenever you have a `ifelse` branch both function are evaluated for each input before doing selection with the mask?

So there would be no actual time saving here by doing a branch inside the `@turbo` macro since both the simple and full version are evaluated for each input?

Yes.
Much of LoopVectorizationâ€™s performance benefit comes from SIMD, which stands for â€śSingle Instruction Multiple Dataâ€ť.

Basically, it applies each instruction to multiple data points. Trying to do this while handling branches is tricky.
Referring to each iteration of a loop / element being processed as a â€ślaneâ€ť, the simplest approach is to have each lane take both sides of the branch and then combine the results afterwards based on which side of the branch each particular lane wouldâ€™ve taken.

If you have a rarely taken path that is also very slow, e.g. you have a function that needs lots of special handling over a certain rarely encountered range of the input, you could insert an actual branch and check `if VectorizationBase.vany(condition)` to only evaluate that branch if it is actually needed.

1 Like