Sum of tuples are slow

I did the following tests

using BenchmarkTools
a = collect(1: 100)
a_tuple = Tuple(a)
@benchmark sum($a)
@benchmark sum($a_tuple)

For the array a the output is

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.040 ns (0.00% GC)
  median time:      11.175 ns (0.00% GC)
  mean time:        11.428 ns (0.00% GC)
  maximum time:     102.707 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

For the tuple a_tuple the output is

BenchmarkTools.Trial: 
  memory estimate:  1.63 KiB
  allocs estimate:  4
  --------------
  minimum time:     4.123 μs (0.00% GC)
  median time:      4.310 μs (0.00% GC)
  mean time:        5.077 μs (1.88% GC)
  maximum time:     966.656 μs (98.90% GC)
  --------------
  samples:          10000
  evals/sample:     7

Have I done something wrong or tuples are indeed much slower in this example?

Tuples don’t necessarily have elements of the same type, so it appears that the sum function isn’t clever enough to realize it could use a vectorized sum in this case.

Foldl is a bit faster or you just reinterpret to an array when you know the types are all the same:

memory estimate:  832 bytes                                
allocs estimate:  2
  --------------
  minimum time:     1.001 μs (0.00% GC)                      
median time:      1.159 μs (0.00% GC)                    
  mean time:        1.124 μs (0.40% GC)                      
maximum time:     46.261 μs (97.10% GC)
  --------------                                           
  samples:          10000                                   
 evals/sample:     10


julia> @benchmark reinterpret(Int, [$a_tuple])
BenchmarkTools.Trial:
  memory estimate:  928 bytes
  allocs estimate:  2
  --------------                                            
 minimum time:     54.358 ns (0.00% GC)                     
median time:      67.345 ns (0.00% GC)                  
   mean time:        74.515 ns (7.90% GC)                     
maximum time:     508.725 ns (84.20% GC)
  --------------                                           
  samples:          10000                                 
   evals/sample:     986
1 Like

Tuples are blazing fast if their length is <= 32:

julia> a = collect(1:32);
       a_tuple = Tuple(a);
       b = Vector{Int}(undef,1);

julia> @benchmark $b[1] = sum($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.451 ns (0.00% GC)
  median time:      17.508 ns (0.00% GC)
  mean time:        20.399 ns (0.00% GC)
  maximum time:     73.336 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

julia> @benchmark $b[1] = sum($a_tuple)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.847 ns (0.00% GC)
  median time:      7.981 ns (0.00% GC)
  mean time:        9.270 ns (0.00% GC)
  maximum time:     62.961 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

(Assigning to $b[1] is necessary to prevent the compiler from eliminating the tuple computations altogether.)

1 Like

@robsmith11 Well, arrays don’t necessarily have elements of the same type as well. It sounds a bit odd to me that tuples, being less flexible than arrays, are slower.
@ettersi Indeed I noticed that the time taken by sum() on tuples changed drastically from 20 to 40 elements, but the reason is still a mystery to me.

A mistery to me is why do you have Tuples with more than 32 elements, XD. It is not better to use StaticArrays.jl at this points? What is your use case?

In general yes, but the complier can infer if a specific array has homegenous elements, and thus can optimise accordingly. So in your case, it know that it’s adding 64 bit integers.

julia> typeof(collect(1: 100))
Array{Int64,1}

However, in this case, that is not the cause of the problem, since it has that information for Tuples as well. It’s just that, as was said previously, Tuples are super optimised for small collections.

julia> typeof(Tuple(collect(1: 100)))
NTuple{100,Int64}
2 Likes

@Henrique_Becker I have no practical use in mind. I just did this test and I found it odd. As I said previously I used to think that tuples, being less flexible, would be more efficient for this kind of simple operation, but it seems that I am wrong.

Thank you guys for the comments/explanations.

No, as people have said, they are more efficient, but they are intended for small collections.

1 Like

Yep, the small collections is key.

They’re also more flexible in some ways:

julia> typeof((3.5, "Hi"))
Tuple{Float64,String}

julia> typeof([3.5, "Hi"])
Array{Any,1}

Operations with the tuple will be inferrable and thus fast, whereas some operations with the array will be noninferrable and thus slow.

2 Likes

There is no inherent reason why sum(a_tuple) with length(a_tuple) > 32 has to be slow. The only reason why it currently is slow is because tuples aren’t intended to be used as long static-sized vectors, and so Julia is not optimised for this case.

Specifically, the issue is that optimizing performance here makes the compiler a little crazy. Also, if you don’t have a cutoff, your type system is turing-complete (which is bad – you want inference to terminate).

1 Like