Sum over tuple slower than sum over array

Hi there,

I wrote a trivial sum function:

function my_sum(collection)
    s = zero(eltype(collection))
    for i in eachindex(collection)
        s += collection[i]
    end
    return s
end

When I apply it to a tuple of Float64 I get the following:

my_tuple = Tuple(rand(10^7));
@btime my_sum($(my_tuple))
  9.295 ms (0 allocations: 0 bytes)
5.000249938227185e6

which is very similar to when I apply it to an equal length array of the same type:

my_vec = rand(10^7);
@btime my_sum($(my_vec))
  9.296 ms (0 allocations: 0 bytes)
5.000539508945694e6

Actually slightly better with the tuple as expected since the type provides the length explicitly and therefore if anything my intuition is that this would help. However, if I do the same with built-in sum function I get:

@btime sum($(my_tuple))
  9.300 ms (0 allocations: 0 bytes)
5.000249938227185e6
@btime sum($(my_vec))
  1.322 ms (0 allocations: 0 bytes)
5.000539508945564e6

Which is a big improvement when using an array! I know the functions dispatched are different but why this difference? does vectorization depend on using an Array-type object?

Thanks in advance!
Miguel.

Tuples of 10^7 elements will be slow. This is not what tuples are for — usually they should have only a handful of elements, and almost never over 100.

Think of a tuple as a small anonymous struct in which you refer to fields by index rather than by name. (A named tuple is even closer to an anonymous struct.) You wouldn’t define a struct with 10^7 fields, I hope?

8 Likes

I see! I saw tuples as static-size arrays given that (aside from the type name) they have a type label and also a size label and therefore I thought that whenever possible making use of this static property by letting the compiler take it for granted would be weakly better. However, my intuition seems too naive! Thanks a lot for the reply.

StaticArrays.jl use tuples under the hood, and you should also use them only for small arrays. Static-size arrays don’t help the compiler when the length is large, and can easily hurt — it’s like you’re trying to compile code that has 10^7 local variables. See, for example:

3 Likes