Performance tips on turning arrays into tuples

Hello,
I have some Vectors of StaticArrays and I use them in functions like this:

fun(vec[1],vec[2],vec[3])

where each position of vec is a StaticArray.
It would be helpful to do something like this:

fun(vec[1:3])
fun(vec[1:3]...,)

because I’m declaring these functions with 3 or 4 inputs. I notice a performance penalty from turning the vectors into a tuple (...) and from passing the 3 StaticArrays as a single vector.
Is there a way to do this that is as fast as fun(vec[1],vec[2],vec[3]), but would let me pass 3 or 4 inputs to fun?
The positions 1:3 are placeholders here. In reality, they are never consecutive and come from another vector of StaticArrays (vec[indices[n][1]], vec[indices[n][2]], vec[indices[n][3]] or vec[indices[n]], to have them grouped).
Thanks a lot!

You could explicitly define two methods, one with 3 other with 4 input vectors. What happens with the fourth vector when the function is called with only 3?

If it is taken as zero you can do something like

fun(v1::T,v2::T,v3::T) where T = fun(v1,v2,v3,zero(T))

for example

ntuple tends to be able extremely fast in my experience.
Disproportionately so.

For the most performance criticsl code, that I benchmark carefully, I often end up using ntuple

2 Likes

@leandromartinez98 I have done this. The question is how to call these functions without explicitly writing the 3 or 4 inputs in a performance efficient way. So far, fun(vec[1],vec[2],vec[3]) is by far the fastest way that I could think of doing this (compared to the other options in my original post).

@oxinabox I can test that, but how do I use ntuple with arbitrary indices? I can do ntuple(i -> vec[i], 3) to get vec[1:3], but how can I get (vec[4], vec[8], vec[2])?
Thanks a lot!

ntuple(i->vec[indices[i]], length(indices))

1 Like

Thanks a lot!
@oxinabox ntuple is indeed very fast. I see almost no performance cost. Problem solved! Thanks a bunch!

I hate that it is so much faster than splatting.
I think it is because it is generated and is easier to constant fold

2 Likes

I am missing something here. What is the final solutions?

vec[indices[i]] allocates a new a vector, doesn’t it?

Even if that is faster than the splatting, it will not as fast a direct call like fun(v1,v2,v3).

Why it would? It is just two indexations: indices[i] and using its return to index vec. It would make sense that an anonymous function (i->vec[indices[i]]) is allocated but it is probably inlined.

1 Like

I think I formulated wrongly the question. indexes[i] is necessarily a vector there, which has to be preallocated, meaning, that is something like

vec = [1,2,3,4,5]
indices = [2,3]
vec[indices]
#or
vec[ [2,3] ]

Is there a syntax that allows that without allocating the intermediate indices vector?

1 Like

I don’t know why it’s fast, but it is. The compiler must be doing something fancy. When I try on a super simple cutdown, using ntuple is the same as ...,, but inside of my function it works great!
The syntax is so obtuse that VSCode highlights it as a possible error, though =P.

@leandromartinez98 I’m using fun(ntuple(i->vec[indices[k][i]],3));

Thanks again, everyone!

2 Likes

Your problem seems closely related to

I may misunderstand this, but the trick with the let block pointed out at the end of this thread may apply to your case as well.

Can you elaborate? Does the following contradict the above?

julia> const c=1000
1000

julia> @btime [2i for i in 1:c];
  864.123 ns (1 allocation: 7.94 KiB)

julia> @btime ntuple(i->2i, c);
  38.839 μs (751 allocations: 43.39 KiB)

Thanks.

It is fast for lengths up to 16. Much beyond that, you shouldn’t really be using tuples anyways.

2 Likes

Note that I said “I benchmark carefully” and that for most performance critical code I find ntuple fastest”

There is no contradiction.

You too should benchmark carefully with something representative of your use case.

The two major cases I have were tuples that needed to correspond to fields of a struct, and tuples that corresponded to dimensions of an array.
In both cases we are talking about much less than 10 elements.

Benchmarking is crucial when you are trying to put the last shine on something.

3 Likes

OK, fair point.

I wasn’t trying to take a cheap shot, just trying to understand.

Thanks.

3 Likes