Performance of recursively building tuples

Benny · November 28, 2020, 9:11pm

I was reading through the source code for broadcasting and CartesianIndices, and I noticed that there were many methods that recursively operated on tuples. I thought it strange because I thought iteration was more intuitive, but I figured there might be some sort of tail call optimization going on.

So I wrote a short script to compare the tail-recursive way and an iterative way.

# tail-recursive way
f(x) = (a(x[1]), f(Base.tail(x))...)
f(x::Tuple{Any}) = a(x[1]) 

# iterative way
g(x) = Tuple(a(e) for e in x)

# operation on tuple element
a(e) = 2*e

x = Tuple(1:1000) # the input tuple

@time f(x) # 14.826541 seconds (9.32 M allocations: 447.837 MiB, 5.50% gc time)
@time f(x) # 0.198740 seconds (842.29 k allocations: 35.926 MiB, 3.62% gc time)
@time g(x) # 0.031490 seconds (74.45 k allocations: 3.950 MiB)
@time g(x) # 0.000074 seconds (753 allocations: 43.484 KiB)

It appears that the iterative way is actually more performant. Can someone explain this result and why tail recursion is used to build tuples so often?

Keno · November 28, 2020, 9:37pm

Well, the tuples in question are dimensions of arrays, which are like length 5 in practice, which the optimizer is ok with. We’ve been planning to add optimizations for long tuples, but that hasn’t been a priority yet.

Benny · November 28, 2020, 9:54pm

So are relatively short (say <10 elements) tuples optimized specially or is it just that the performance hits only start adding up at larger lengths?

Keno · November 28, 2020, 10:10pm

Bit of both. There are limits in the compiler that penalize extra long tuples.

Jean_Michel · November 29, 2020, 11:45am

The timing profile is quite interesting:

julia> x=Tuple(1:30);@btime f(x);
  204.906 ns (4 allocations: 320 bytes)

julia> x=Tuple(1:30);@btime g(x);
  843.543 ns (2 allocations: 592 bytes)

julia> x=Tuple(1:40);@btime f(x);
  16.220 μs (34 allocations: 7.39 KiB)

julia> x=Tuple(1:40);@btime g(x);
  1.102 μs (2 allocations: 736 bytes)

piever · November 29, 2020, 12:16pm

I think it’s useful to look at @code_typed for small tuples, to see that the recursive version generates the same code that you would write by hand, so it is extremely efficient.

julia> mylog(a::Tuple{}) = ()
mylog (generic function with 1 method)

julia> mylog(a::Tuple) = (log(first(a)), mylog(Base.tail(a))...)
mylog (generic function with 2 methods)

julia> @code_typed mylog((1.0, 2.0, 3.0))
CodeInfo(
1 ─ %1 = Base.getfield(a, 1, true)::Float64
│   %2 = invoke Main.log(%1::Float64)::Float64
│   %3 = (getfield)(a, 2)::Float64
│   %4 = (getfield)(a, 3)::Float64
│   %5 = invoke Main.log(%3::Float64)::Float64
│   %6 = invoke Main.log(%4::Float64)::Float64
│   %7 = Core.tuple(%2, %5, %6)::Tuple{Float64,Float64,Float64}
└──      return %7
) => Tuple{Float64,Float64,Float64}

My understanding is that this doesn’t really work for long tuples, so Base julia “bails out” and uses a different approach for length over 16, see here for example.

In practice, I guess in most cases you could use map, foldl, and filter, which are optimized for tuples of any length (i.e., they change strategy for long tuples).

Benny · November 30, 2020, 2:51am

Tried @code_warntype for f and g in my script and saw a possible reason why the recursive version was preferred: f’s return type was inferred as the concrete NTuple{1000,Int64}, but g’s return type was inferred as the abstract Tuple{Vararg{Int64,N} where N}. I don’t actually know how much of an issue this type instability would be, though.
Weirdly, editing g to g(x::NTuple{N,T}) where {N,T} = NTuple{N,T}(a(e) for e in x) didn’t change anything.

Topic		Replies	Views
Strange performance characteristics and regression with tuple recursion Performance	5	1578	July 19, 2021
Iterating on Tuple allocates memory if values don't have identical types Performance	4	354	April 13, 2022
Efficient recursive tuple construction General Usage question , tuple , recursion	19	1068	September 10, 2021
Hashing tuples of tuples is seemingly very slow Performance	6	720	December 22, 2020
Yet another TCO thread Performance recursion	5	2367	May 25, 2022

Performance of recursively building tuples

Related topics