Sum of tuples are slow

I did the following tests

using BenchmarkTools
a = collect(1: 100)
a_tuple = Tuple(a)
@benchmark sum($a)
@benchmark sum($a_tuple)

For the array a the output is

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.040 ns (0.00% GC)
  median time:      11.175 ns (0.00% GC)
  mean time:        11.428 ns (0.00% GC)
  maximum time:     102.707 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

For the tuple a_tuple the output is

BenchmarkTools.Trial: 
  memory estimate:  1.63 KiB
  allocs estimate:  4
  --------------
  minimum time:     4.123 ÎĽs (0.00% GC)
  median time:      4.310 ÎĽs (0.00% GC)
  mean time:        5.077 ÎĽs (1.88% GC)
  maximum time:     966.656 ÎĽs (98.90% GC)
  --------------
  samples:          10000
  evals/sample:     7

Have I done something wrong or tuples are indeed much slower in this example?

Tuples don’t necessarily have elements of the same type, so it appears that the sum function isn’t clever enough to realize it could use a vectorized sum in this case.

Foldl is a bit faster or you just reinterpret to an array when you know the types are all the same:

memory estimate:  832 bytes                                
allocs estimate:  2
  --------------
  minimum time:     1.001 ÎĽs (0.00% GC)                      
median time:      1.159 ÎĽs (0.00% GC)                    
  mean time:        1.124 ÎĽs (0.40% GC)                      
maximum time:     46.261 ÎĽs (97.10% GC)
  --------------                                           
  samples:          10000                                   
 evals/sample:     10


julia> @benchmark reinterpret(Int, [$a_tuple])
BenchmarkTools.Trial:
  memory estimate:  928 bytes
  allocs estimate:  2
  --------------                                            
 minimum time:     54.358 ns (0.00% GC)                     
median time:      67.345 ns (0.00% GC)                  
   mean time:        74.515 ns (7.90% GC)                     
maximum time:     508.725 ns (84.20% GC)
  --------------                                           
  samples:          10000                                 
   evals/sample:     986
1 Like

Tuples are blazing fast if their length is <= 32:

julia> a = collect(1:32);
       a_tuple = Tuple(a);
       b = Vector{Int}(undef,1);

julia> @benchmark $b[1] = sum($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.451 ns (0.00% GC)
  median time:      17.508 ns (0.00% GC)
  mean time:        20.399 ns (0.00% GC)
  maximum time:     73.336 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

julia> @benchmark $b[1] = sum($a_tuple)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.847 ns (0.00% GC)
  median time:      7.981 ns (0.00% GC)
  mean time:        9.270 ns (0.00% GC)
  maximum time:     62.961 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

(Assigning to $b[1] is necessary to prevent the compiler from eliminating the tuple computations altogether.)

1 Like

@robsmith11 Well, arrays don’t necessarily have elements of the same type as well. It sounds a bit odd to me that tuples, being less flexible than arrays, are slower.
@ettersi Indeed I noticed that the time taken by sum() on tuples changed drastically from 20 to 40 elements, but the reason is still a mystery to me.

A mistery to me is why do you have Tuples with more than 32 elements, XD. It is not better to use StaticArrays.jl at this points? What is your use case?

In general yes, but the complier can infer if a specific array has homegenous elements, and thus can optimise accordingly. So in your case, it know that it’s adding 64 bit integers.

julia> typeof(collect(1: 100))
Array{Int64,1}

However, in this case, that is not the cause of the problem, since it has that information for Tuples as well. It’s just that, as was said previously, Tuples are super optimised for small collections.

julia> typeof(Tuple(collect(1: 100)))
NTuple{100,Int64}
2 Likes

@Henrique_Becker I have no practical use in mind. I just did this test and I found it odd. As I said previously I used to think that tuples, being less flexible, would be more efficient for this kind of simple operation, but it seems that I am wrong.

Thank you guys for the comments/explanations.

No, as people have said, they are more efficient, but they are intended for small collections.

1 Like

Yep, the small collections is key.

They’re also more flexible in some ways:

julia> typeof((3.5, "Hi"))
Tuple{Float64,String}

julia> typeof([3.5, "Hi"])
Array{Any,1}

Operations with the tuple will be inferrable and thus fast, whereas some operations with the array will be noninferrable and thus slow.

3 Likes

There is no inherent reason why sum(a_tuple) with length(a_tuple) > 32 has to be slow. The only reason why it currently is slow is because tuples aren’t intended to be used as long static-sized vectors, and so Julia is not optimised for this case.

Specifically, the issue is that optimizing performance here makes the compiler a little crazy. Also, if you don’t have a cutoff, your type system is turing-complete (which is bad – you want inference to terminate).

1 Like

Just ran into the same with cumsum:

julia> using BenchmarkTools

julia> tpl = (rand(1:10, 1)...,); @btime cumsum($tpl);
  2.520 ns (0 allocations: 0 bytes)

julia> tpl = (rand(1:10, 10)...,); @btime cumsum($tpl);
  3.785 ns (0 allocations: 0 bytes)

julia> tpl = (rand(1:10, 32)...,); @btime cumsum($tpl);
  14.795 ns (0 allocations: 0 bytes)

julia> tpl = (rand(1:10, 33)...,); @btime cumsum($tpl);
  3.041 ÎĽs (9 allocations: 1.89 KiB)

julia> tpl = (rand(1:10, 64)...,); @btime cumsum($tpl);
  47.400 ÎĽs (102 allocations: 28.34 KiB)

julia> tpl = (rand(1:10, 100)...,); @btime cumsum($tpl);
  128.165 ÎĽs (245 allocations: 78.97 KiB)

julia> VERSION
v"1.10.0-beta3"

The compiler may not be able to optimize as well for larger tuples, but the jump from ns to ÎĽs still seems huge. The cost seems to grow non-linearly, too, but not in a simple fashion. I guess cumcum goes through intermediate tuples of different sizes?

Would there be any value if Julia provided support for persistent arrays in Base (where elements are immutable)? StaticArrays were not intended for and do not work well for large arrays. FunctionalCollections.jl supports a “PersistentVector” type, which does work for longer arrays, but does not seem to facilitate compilation the way that StaticArrays does, and it does not support more general array types than Vector. BTW, how does Julia program its arrays? When I read through the Julia Base code, it looked like arrays were built into Julia rather than being written in Julia. I wonder if that imposes meaningful limitations on packages defining persistent arrays?