Performance tips on turning arrays into tuples

Ribeiro · February 18, 2021, 11:35pm

Hello,
I have some Vectors of StaticArrays and I use them in functions like this:

fun(vec[1],vec[2],vec[3])

where each position of vec is a StaticArray.
It would be helpful to do something like this:

fun(vec[1:3])
fun(vec[1:3]...,)

because I’m declaring these functions with 3 or 4 inputs. I notice a performance penalty from turning the vectors into a tuple (...) and from passing the 3 StaticArrays as a single vector.
Is there a way to do this that is as fast as fun(vec[1],vec[2],vec[3]), but would let me pass 3 or 4 inputs to fun?
The positions 1:3 are placeholders here. In reality, they are never consecutive and come from another vector of StaticArrays (vec[indices[n][1]], vec[indices[n][2]], vec[indices[n][3]] or vec[indices[n]], to have them grouped).
Thanks a lot!

lmiq · February 18, 2021, 11:58pm

You could explicitly define two methods, one with 3 other with 4 input vectors. What happens with the fourth vector when the function is called with only 3?

If it is taken as zero you can do something like

fun(v1::T,v2::T,v3::T) where T = fun(v1,v2,v3,zero(T))

for example

oxinabox · February 19, 2021, 12:13am

ntuple tends to be able extremely fast in my experience.
Disproportionately so.

For the most performance criticsl code, that I benchmark carefully, I often end up using ntuple

Ribeiro · February 19, 2021, 8:55am

@lmiq I have done this. The question is how to call these functions without explicitly writing the 3 or 4 inputs in a performance efficient way. So far, fun(vec[1],vec[2],vec[3]) is by far the fastest way that I could think of doing this (compared to the other options in my original post).

@oxinabox I can test that, but how do I use ntuple with arbitrary indices? I can do ntuple(i -> vec[i], 3) to get vec[1:3], but how can I get (vec[4], vec[8], vec[2])?
Thanks a lot!

Vasily_Pisarev · February 19, 2021, 9:34am

ntuple(i->vec[indices[i]], length(indices))

Ribeiro · February 19, 2021, 9:51am

Thanks a lot!
@oxinabox ntuple is indeed very fast. I see almost no performance cost. Problem solved! Thanks a bunch!

oxinabox · February 19, 2021, 11:07am

I hate that it is so much faster than splatting.
I think it is because it is generated and is easier to constant fold

lmiq · February 19, 2021, 11:53am

I am missing something here. What is the final solutions?

vec[indices[i]] allocates a new a vector, doesn’t it?

Even if that is faster than the splatting, it will not as fast a direct call like fun(v1,v2,v3).

Henrique_Becker · February 19, 2021, 1:05pm

Why it would? It is just two indexations: indices[i] and using its return to index vec. It would make sense that an anonymous function (i->vec[indices[i]]) is allocated but it is probably inlined.

lmiq · February 19, 2021, 1:11pm

I think I formulated wrongly the question. indexes[i] is necessarily a vector there, which has to be preallocated, meaning, that is something like

vec = [1,2,3,4,5]
indices = [2,3]
vec[indices]
#or
vec[ [2,3] ]

Is there a syntax that allows that without allocating the intermediate indices vector?

Ribeiro · February 19, 2021, 1:18pm

I don’t know why it’s fast, but it is. The compiler must be doing something fancy. When I try on a super simple cutdown, using ntuple is the same as ...,, but inside of my function it works great!
The syntax is so obtuse that VSCode highlights it as a possible error, though =P.

@lmiq I’m using fun(ntuple(i->vec[indices[k][i]],3));

Thanks again, everyone!

hendri54 · February 19, 2021, 7:40pm

Your problem seems closely related to

I may misunderstand this, but the trick with the let block pointed out at the end of this thread may apply to your case as well.

healyp · February 20, 2021, 4:47pm

Can you elaborate? Does the following contradict the above?

julia> const c=1000
1000

julia> @btime [2i for i in 1:c];
  864.123 ns (1 allocation: 7.94 KiB)

julia> @btime ntuple(i->2i, c);
  38.839 μs (751 allocations: 43.39 KiB)

Thanks.

simeonschaub · February 20, 2021, 4:51pm

It is fast for lengths up to 16. Much beyond that, you shouldn’t really be using tuples anyways.

oxinabox · February 20, 2021, 6:21pm

Note that I said “I benchmark carefully” and that for “most performance critical code I find ntuple fastest”

There is no contradiction.

You too should benchmark carefully with something representative of your use case.

The two major cases I have were tuples that needed to correspond to fields of a struct, and tuples that corresponded to dimensions of an array.
In both cases we are talking about much less than 10 elements.

Benchmarking is crucial when you are trying to put the last shine on something.

healyp · February 20, 2021, 6:27pm

OK, fair point.

I wasn’t trying to take a cheap shot, just trying to understand.

Thanks.

Topic		Replies	Views
Vectors vs. Tuples and multiple dispatch New to Julia performance	3	944	April 14, 2021
Converting Vector to Tuple seems not optimal Performance tuple , memory-allocation	11	423	June 22, 2024
Converting Vector{Any} to Tuple of Floats New to Julia	11	9712	August 20, 2018
Array of tuples to tuple of arrays New to Julia	3	2703	October 31, 2018
Should Julia be able to optimize away small temporary arrays? New to Julia question , performance , array , tuple	10	1249	July 23, 2022

Performance tips on turning arrays into tuples

Related topics