Multiple Outputs (Tuple) from IfElse with VectorizationBase

disberd · September 16, 2021, 11:34am

Hi there,

I am trying to experiment a bit with VectorizationBase to create functions that can be used inside the @turbo macro from LoopVectorization.

I have a use case where my computation can be significantly sped up in case one of my inputs is 0 (not only sped up but the full routine would lead to NaN values for points where the input is 0) so I was thinking of generating two subfunctions that support Vec inputs from VectorizationBase and call them using IfElse.ifelse.

Since I am dealing with Complex outputs and LoopVectorization does not currently support complex inputs, I would need to output 2 values out of the subfunctions (or two sets of Vec, one for the real and one for the complex part of the result).

Unfortunately it seems that IfElse.ifelse fails as soon as I want to produce more than one output from my subfunctions.

Here is a MWE of the problem I am facing.

using IfElse, VectorizationBase

 vx = Vec(ntuple(_ -> rand(0:3), VectorizationBase.pick_vector_width(Int64))...)

_simple(x) = (x,x)

_full(x) = (10x,5x)

IfElse.ifelse(vx == 0,_simple(vx)[1],_full(vx)[1])

The above example work, but if one removes the indexing inside the ifelse as follows:
IfElse.ifelse(vx == 0,_simple(vx),_full(vx))

I get the following error:

I could use two ifelse statements to gather both inputs separately but I guess that would re-perfom all the computations inside the subfunctions twice.

Is getting two (or more) outputs out of the ifelse possible or is it not supported yet?

Tagging @Elrod as he might be able to shed light into this

Elrod · September 16, 2021, 12:53pm

I could add a method.
And you wouldn’t have to recompute:

s = _simple(vx)
f = _full(vx)
x = IfElse.ifelse(vx == 0, s[1], f[1])
y = IfElse.ifelse(vx == 0, s[2], f[2])

Alternatively, this should already work:

VectorizationBase.data(IfElse.ifelse(vx == 0, VecUnroll(_simple(vx)), VecUnroll(_full(vx))))

disberd · September 16, 2021, 2:22pm

Thanks a lot,

Looking at your first proposed solution, this implies that whenever you have a ifelse branch both function are evaluated for each input before doing selection with the mask?

So there would be no actual time saving here by doing a branch inside the @turbo macro since both the simple and full version are evaluated for each input?

Elrod · September 16, 2021, 2:36pm

Yes.
Much of LoopVectorization’s performance benefit comes from SIMD, which stands for “Single Instruction Multiple Data”.

Basically, it applies each instruction to multiple data points. Trying to do this while handling branches is tricky.
Referring to each iteration of a loop / element being processed as a “lane”, the simplest approach is to have each lane take both sides of the branch and then combine the results afterwards based on which side of the branch each particular lane would’ve taken.

If you have a rarely taken path that is also very slow, e.g. you have a function that needs lots of special handling over a certain rarely encountered range of the input, you could insert an actual branch and check if VectorizationBase.vany(condition) to only evaluate that branch if it is actually needed.

Topic		Replies	Views
Experiments with VectorizationBase Performance	6	668	March 23, 2021
IfElse doesn't work properly when using @turbo Performance loopvectorization	3	352	February 14, 2023
LoopVectorization: Best way to have a multi and single threaded version? General Usage question , package	0	125	May 15, 2023
LoopVectorization for sparse matrix operation : @turbo, LoopVectorization.check_args Performance question	1	375	March 3, 2023
Efficient use of @turbo for linear algebra operations (LoopVectorization.jl) Performance linearalgebra , loopvectorization	6	3990	August 21, 2021

Multiple Outputs (Tuple) from IfElse with VectorizationBase

Related topics