Efficient recursive tuple construction

markmbaum · September 8, 2021, 1:12pm

I’m wondering if there is an efficient way to build a tuple recursively. I’m working with a three-term recurrence relationship (specifically, this one).

The first and second terms are known. From them, all the other terms can be computed, one-by-one. The number of terms is known at run time.

I can allocate a vector and fill in the sequence, but I’m wondering if there might be tricks to avoid doing such an allocation. I tried a recursive splatting (slurping?) approach to build the tuple,

function Tₖ(T::Tuple, ξ, n::Int)
    if length(T) == n
        return T
    else
        return Tₖ((T..., 2*ξ*T[end] - T[end-1]), ξ, n)
    end
end

but this results in horrible type instability when indexing the tuple.

Any ideas would be appreciated!

lmiq · September 8, 2021, 1:29pm

Here is something that may serve as inspiration:

julia> function Tₖ(t::Tuple,::Val{N},::Val{M}) where {N,M}
          ntuple(i -> i <= N ? t[i] : 1, M)
       end
Tₖ (generic function with 1 method)

julia> t = (0,0,0);

julia> @btime Tₖ($t,Val(length($t)),Val(8))
  0.021 ns (0 allocations: 0 bytes)
(0, 0, 0, 1, 1, 1, 1, 1)

Note that this will be compiled once for every combination of N and M, so that will be slow to be called recursively if that recursion occurs only once in your code. If you have to repeat the recursion multiple times, that that may work (obs: the 0.021 ns is not meaninful there, for certain, but that is allocation free and fast).

stevengj · September 8, 2021, 3:26pm

A tuple is probably the wrong data structure if the length is only known at runtime. This is what a Vector is for.

markmbaum · September 8, 2021, 4:11pm

Allocation may be unavoidable, but it seemed like it might be possible to avoid it because all values of the sequence are known, in a sense. They’re all determined by a single input.

gustaphe · September 8, 2021, 8:12pm

So is the halting problem.

markmbaum · September 8, 2021, 8:28pm

Sure, but we know exactly how and when this recurrance will halt.

DNF · September 8, 2021, 9:26pm

You know, but the compiler does not. And the latter point is crucial for performance.

markmbaum · September 8, 2021, 11:25pm

Well, is there a way to tell the compiler? The number of terms is known at runtime so one could use it for dispatch?

lmiq · September 8, 2021, 11:37pm

This is what I was trying above. In your case, it would be:

julia> function Tₖ(T::Tuple, ξ, ::Val{N}, ::Val{M}) where {N,M}
           N == M && return T
           tup = ntuple(N+1) do i 
               if i <= N
                   return T[i]
               elseif i == N + 1
                   return 2*ξ*T[N] - T[N-1]
               end
           end      
           Tₖ(tup, ξ, Val(N+1), Val(M))
       end
Tₖ (generic function with 1 method)

julia> using BenchmarkTools

julia> Tₖ((1,2),3,Val(2),Val(9))
(1, 2, 11, 64, 373, 2174, 12671, 73852, 430441)

julia> @btime Tₖ((1,2),3,Val(2),Val(9))
  77.598 ns (5 allocations: 336 bytes)
(1, 2, 11, 64, 373, 2174, 12671, 73852, 430441)

It does perform slightly better than your original code, apparently. But I could not get rid completely of the allocations because the return type of the tuple is unknown, causing a type instability.

julia> function Tₖ(T::Tuple, ξ, n::Int)
           if length(T) == n
               return T
           else
               return Tₖ((T..., 2*ξ*T[end] - T[end-1]), ξ, n)
           end
       end
Tₖ (generic function with 2 methods)

julia> @btime Tₖ((1,2),3,9)
  130.979 ns (7 allocations: 464 bytes)
(1, 2, 11, 64, 373, 2174, 12671, 73852, 430441)

julia> @btime Tₖ((1,2),3,9)
  108.886 ns (6 allocations: 416 bytes)
(1, 2, 11, 64, 373, 2174, 12671, 73852, 430441)

markmbaum · September 8, 2021, 11:49pm

Yes I was trying some similar approaches but never got results faster than simply allocating and filling a vector

lmiq · September 9, 2021, 12:17am

What about (disclaimer: I think the result is wrong, but that is a detail). Instead of the recursion, repeat the calculations. Calculations are much cheaper than allocations:

julia> function Tₖ(T::Tuple, ξ, ::Val{M}) where {M}
           tup = ntuple(M) do i 
               if i <= length(T)
                   return T[i]
               else
                   el2 = T[end-1]
                   el1 = 2*ξ*T[end]
                   for j in 1:i-length(T)
                       el2 = el1
                       el1 = 2*ξ*el2
                   end
                   return el1
               end
           end      
           return tup
       end
Tₖ (generic function with 3 methods)

julia> Tₖ((1,2),3,Val(9))
(1, 2, 72, 432, 2592, 15552, 93312, 559872, 3359232)

julia> @btime Tₖ((1,2),3,Val(9))
  8.022 ns (0 allocations: 0 bytes)
(1, 2, 72, 432, 2592, 15552, 93312, 559872, 3359232)

markmbaum · September 9, 2021, 12:58am

That’s an interesting idea. I tried the following implementation

function Tk_(ξ, L::Int)
    L == 1 && return(one(ξ))
    L == 2 && return(ξ)
    Tₖ₋₂ = one(ξ) 
    Tₖ₋₁ = ξ
    Tₖ = zero(ξ)
    for _ = 3:L
        Tₖ = 2ξ*Tₖ₋₁ - Tₖ₋₂
        Tₖ₋₂ = Tₖ₋₁
        Tₖ₋₁ = Tₖ
    end
    return(Tₖ)
end

function Tk(ξ::U, ::Val{L})::NTuple{L,U} where {L,U}
    ntuple(i->Tk_(ξ,i), L)
end

BUT, weirdly, there is a huge performance dropoff when the number of elements exceeds 10:

using BenchmarkTools

julia> @btime Tk(0.1, Val(9))
  7.900 ns (0 allocations: 0 bytes)

julia> @btime Tk(0.1, Val(10))
  8.709 ns (0 allocations: 0 bytes)

julia> @btime Tk(0.1, Val(11))
  426.633 ns (13 allocations: 448 bytes)

julia> @btime Tk(0.1, Val(12))
  463.452 ns (14 allocations: 480 bytes)

Suddenly, at >11 elements, allocations and a considerable slowdown. This is totally confusing to me.

lmiq · September 9, 2021, 1:08am

Maybe inlining the function helps? Just a guess.

mcabbott · September 9, 2021, 1:18am

It looks like it inlines everything until then, but reverts to a normal function call once it decides things are too big?

One way around is to make a generated function:

julia> @generated function Tkgen(T::Tuple, ξ, ::Val{M}) where {M}
           tup = ntuple(M) do i 
               if i <= length(T.parameters)
                   return :(T[$i])
               else
                   el2 = :(T[end-1])
                   el1 = :(2*ξ*T[end])
                   for j in 1:i-length(T.parameters)
                       el2 = el1
                       el1 = :(2*ξ*$el2)
                   end
                   return el1
               end
           end      
           return :(($(tup...),))
       end
Tkgen (generic function with 1 method)

julia> @btime Tkgen((1,2),3,Val(9))
  0.001 ns (0 allocations: 0 bytes)
(1, 2, 72, 432, 2592, 15552, 93312, 559872, 3359232)

julia> @btime Tkgen((1,2),3,Val(15))
  0.001 ns (0 allocations: 0 bytes)
(1, 2, 72, 432, 2592, 15552, 93312, 559872, 3359232, 20155392, 120932352, 725594112, 4353564672, 26121388032, 156728328192)

julia> @btime map(x -> Tkgen((1,2),x,Val(9)), $(rand(1000)));  # a more realistic test
  13.041 μs (2 allocations: 70.36 KiB)

julia> @btime map(x -> Tₖ((1,2),x,Val(9)), $(rand(1000)));  # same speed
  13.042 μs (2 allocations: 70.36 KiB)

julia> @btime map(x -> Tkgen((1,2),x,Val(15)), $(rand(1000)));
  15.666 μs (2 allocations: 117.23 KiB)

markmbaum · September 9, 2021, 5:53pm

Does a generated function work for you if the number of terms is a little larger? Say, 100? I gave it a try and had issues with larger numbers of terms.

mcabbott · September 9, 2021, 7:32pm

What sort of issues?

I didn’t try, but 100 is very long for a tuple. For most purposes you will want to switch to a vector long before that; many Base functions switch at 16 or 32.

markmbaum · September 9, 2021, 7:38pm

Critically slow. But you’re totally right. It’s getting pretty hairy and the solution is probably just to use vectors.

If anyone’s interested, the underlying issue has to do with doing Chebyshev interpolation in 2D. I had been using pre-allocated vectors for the chebyshev expansions, but pre-allocating strictly limits the type of the interpolation coordinate. Now I’m just dispatching on the coordinate type—using the pre-allocated vectors when possible and allocating otherwise.

The relevant methods are here.

stevengj · September 9, 2021, 10:42pm

See e.g. GitHub - JuliaMath/FastChebInterp.jl: fast multidimensional Chebyshev interpolation and regression in Julia

markmbaum · September 10, 2021, 2:16am

I didn’t know that was being worked on!

I’m not sure why, but I’m generally getting ~2x better speed. Happy to share the benchmarking code.

one_dimension

two_dimensions

Note the issue I was asking about in this post don’t apply in these tests. The interpolation is all allocation-free. For different coordinate types the operation is forced to allocate and slows down a little.

stevengj · September 10, 2021, 12:03pm

Sure, could you file an issue?

Topic		Replies	Views
Performance of recursively building tuples Performance	6	518	November 30, 2020
Is there an idiom for constructing a heterogeneous tuple? General Usage question	1	451	February 22, 2019
Generate sub-tuples in an efficient, type-stable way Performance	2	310	August 12, 2023
Fill function for a tuple New to Julia ntuple	6	1169	August 8, 2022
How to make splat, map and tuples be efficient Performance tuple , map , splat	3	112	February 7, 2025

Efficient recursive tuple construction

Related topics