Sum of tuples are slow

tfr · May 1, 2020, 10:11am

I did the following tests

using BenchmarkTools
a = collect(1: 100)
a_tuple = Tuple(a)
@benchmark sum($a)
@benchmark sum($a_tuple)

For the array a the output is

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.040 ns (0.00% GC)
  median time:      11.175 ns (0.00% GC)
  mean time:        11.428 ns (0.00% GC)
  maximum time:     102.707 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

For the tuple a_tuple the output is

BenchmarkTools.Trial: 
  memory estimate:  1.63 KiB
  allocs estimate:  4
  --------------
  minimum time:     4.123 μs (0.00% GC)
  median time:      4.310 μs (0.00% GC)
  mean time:        5.077 μs (1.88% GC)
  maximum time:     966.656 μs (98.90% GC)
  --------------
  samples:          10000
  evals/sample:     7

Have I done something wrong or tuples are indeed much slower in this example?

robsmith11 · May 1, 2020, 12:09pm

Tuples don’t necessarily have elements of the same type, so it appears that the sum function isn’t clever enough to realize it could use a vectorized sum in this case.

Foldl is a bit faster or you just reinterpret to an array when you know the types are all the same:

memory estimate:  832 bytes                                
allocs estimate:  2
  --------------
  minimum time:     1.001 μs (0.00% GC)                      
median time:      1.159 μs (0.00% GC)                    
  mean time:        1.124 μs (0.40% GC)                      
maximum time:     46.261 μs (97.10% GC)
  --------------                                           
  samples:          10000                                   
 evals/sample:     10


julia> @benchmark reinterpret(Int, [$a_tuple])
BenchmarkTools.Trial:
  memory estimate:  928 bytes
  allocs estimate:  2
  --------------                                            
 minimum time:     54.358 ns (0.00% GC)                     
median time:      67.345 ns (0.00% GC)                  
   mean time:        74.515 ns (7.90% GC)                     
maximum time:     508.725 ns (84.20% GC)
  --------------                                           
  samples:          10000                                 
   evals/sample:     986

ettersi · May 1, 2020, 2:34pm

Tuples are blazing fast if their length is <= 32:

julia> a = collect(1:32);
       a_tuple = Tuple(a);
       b = Vector{Int}(undef,1);

julia> @benchmark $b[1] = sum($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     17.451 ns (0.00% GC)
  median time:      17.508 ns (0.00% GC)
  mean time:        20.399 ns (0.00% GC)
  maximum time:     73.336 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

julia> @benchmark $b[1] = sum($a_tuple)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.847 ns (0.00% GC)
  median time:      7.981 ns (0.00% GC)
  mean time:        9.270 ns (0.00% GC)
  maximum time:     62.961 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

(Assigning to $b[1] is necessary to prevent the compiler from eliminating the tuple computations altogether.)

tfr · May 1, 2020, 3:05pm

@robsmith11 Well, arrays don’t necessarily have elements of the same type as well. It sounds a bit odd to me that tuples, being less flexible than arrays, are slower.
@ettersi Indeed I noticed that the time taken by sum() on tuples changed drastically from 20 to 40 elements, but the reason is still a mystery to me.

Henrique_Becker · May 1, 2020, 3:18pm

A mistery to me is why do you have Tuples with more than 32 elements, XD. It is not better to use StaticArrays.jl at this points? What is your use case?

avik · May 1, 2020, 3:26pm

In general yes, but the complier can infer if a specific array has homegenous elements, and thus can optimise accordingly. So in your case, it know that it’s adding 64 bit integers.

julia> typeof(collect(1: 100))
Array{Int64,1}

However, in this case, that is not the cause of the problem, since it has that information for Tuples as well. It’s just that, as was said previously, Tuples are super optimised for small collections.

julia> typeof(Tuple(collect(1: 100)))
NTuple{100,Int64}

tfr · May 1, 2020, 3:33pm

@Henrique_Becker I have no practical use in mind. I just did this test and I found it odd. As I said previously I used to think that tuples, being less flexible, would be more efficient for this kind of simple operation, but it seems that I am wrong.

Thank you guys for the comments/explanations.

DNF · May 1, 2020, 3:59pm

No, as people have said, they are more efficient, but they are intended for small collections.

tim.holy · May 1, 2020, 9:09pm

Yep, the small collections is key.

They’re also more flexible in some ways:

julia> typeof((3.5, "Hi"))
Tuple{Float64,String}

julia> typeof([3.5, "Hi"])
Array{Any,1}

Operations with the tuple will be inferrable and thus fast, whereas some operations with the array will be noninferrable and thus slow.

ettersi · May 4, 2020, 2:28am

There is no inherent reason why sum(a_tuple) with length(a_tuple) > 32 has to be slow. The only reason why it currently is slow is because tuples aren’t intended to be used as long static-sized vectors, and so Julia is not optimised for this case.

Oscar_Smith · May 4, 2020, 4:38am

Specifically, the issue is that optimizing performance here makes the compiler a little crazy. Also, if you don’t have a cutoff, your type system is turing-complete (which is bad – you want inference to terminate).

oschulz · November 17, 2023, 2:23pm

Just ran into the same with cumsum:

julia> using BenchmarkTools

julia> tpl = (rand(1:10, 1)...,); @btime cumsum($tpl);
  2.520 ns (0 allocations: 0 bytes)

julia> tpl = (rand(1:10, 10)...,); @btime cumsum($tpl);
  3.785 ns (0 allocations: 0 bytes)

julia> tpl = (rand(1:10, 32)...,); @btime cumsum($tpl);
  14.795 ns (0 allocations: 0 bytes)

julia> tpl = (rand(1:10, 33)...,); @btime cumsum($tpl);
  3.041 μs (9 allocations: 1.89 KiB)

julia> tpl = (rand(1:10, 64)...,); @btime cumsum($tpl);
  47.400 μs (102 allocations: 28.34 KiB)

julia> tpl = (rand(1:10, 100)...,); @btime cumsum($tpl);
  128.165 μs (245 allocations: 78.97 KiB)

julia> VERSION
v"1.10.0-beta3"

The compiler may not be able to optimize as well for larger tuples, but the jump from ns to μs still seems huge. The cost seems to grow non-linearly, too, but not in a simple fashion. I guess cumcum goes through intermediate tuples of different sizes?

croberts · November 17, 2023, 2:45pm

Would there be any value if Julia provided support for persistent arrays in Base (where elements are immutable)? StaticArrays were not intended for and do not work well for large arrays. FunctionalCollections.jl supports a “PersistentVector” type, which does work for longer arrays, but does not seem to facilitate compilation the way that StaticArrays does, and it does not support more general array types than Vector. BTW, how does Julia program its arrays? When I read through the Julia Base code, it looked like arrays were built into Julia rather than being written in Julia. I wonder if that imposes meaningful limitations on packages defining persistent arrays?

Topic		Replies	Views
Sum over tuple slower than sum over array Performance	3	165	January 24, 2025
Converting Vector to Tuple seems not optimal Performance tuple , memory-allocation	11	449	June 22, 2024
Performance differences between `Array` and `Tuple` when combined with `zip` of index range in `for` loop New to Julia question	2	354	February 7, 2022
Possible Issue? Performance making tuples from vectors General Usage bug , performance	0	396	July 25, 2019
Unrolling loops over tuples - why so hard? Performance tuple , unrolling	14	1366	September 10, 2023

Sum of tuples are slow

Related topics