Hello,
I have some Vectors of StaticArrays and I use them in functions like this:
fun(vec[1],vec[2],vec[3])
where each position of vec is a StaticArray.
It would be helpful to do something like this:
fun(vec[1:3])
fun(vec[1:3]...,)
because I’m declaring these functions with 3 or 4 inputs. I notice a performance penalty from turning the vectors into a tuple (...) and from passing the 3 StaticArrays as a single vector.
Is there a way to do this that is as fast as fun(vec[1],vec[2],vec[3]), but would let me pass 3 or 4 inputs to fun?
The positions 1:3 are placeholders here. In reality, they are never consecutive and come from another vector of StaticArrays (vec[indices[n][1]], vec[indices[n][2]], vec[indices[n][3]] or vec[indices[n]], to have them grouped).
Thanks a lot!
You could explicitly define two methods, one with 3 other with 4 input vectors. What happens with the fourth vector when the function is called with only 3?
If it is taken as zero you can do something like
fun(v1::T,v2::T,v3::T) where T = fun(v1,v2,v3,zero(T))
@lmiq I have done this. The question is how to call these functions without explicitly writing the 3 or 4 inputs in a performance efficient way. So far, fun(vec[1],vec[2],vec[3]) is by far the fastest way that I could think of doing this (compared to the other options in my original post).
@oxinabox I can test that, but how do I use ntuple with arbitrary indices? I can do ntuple(i -> vec[i], 3) to get vec[1:3], but how can I get (vec[4], vec[8], vec[2])?
Thanks a lot!
Why it would? It is just two indexations: indices[i] and using its return to index vec. It would make sense that an anonymous function (i->vec[indices[i]]) is allocated but it is probably inlined.
I don’t know why it’s fast, but it is. The compiler must be doing something fancy. When I try on a super simple cutdown, using ntuple is the same as ...,, but inside of my function it works great!
The syntax is so obtuse that VSCode highlights it as a possible error, though =P.
@lmiq I’m using fun(ntuple(i->vec[indices[k][i]],3));
Note that I said “I benchmark carefully” and that for “most performance critical code I find ntuple fastest”
There is no contradiction.
You too should benchmark carefully with something representative of your use case.
The two major cases I have were tuples that needed to correspond to fields of a struct, and tuples that corresponded to dimensions of an array.
In both cases we are talking about much less than 10 elements.
Benchmarking is crucial when you are trying to put the last shine on something.