I’m trying to make a recursive reduction over a `Tuple`

that calculates the inverse of the reduction of a nested `Tuple`

everytime it encounters one. I.e. for `x=(1,(2,3))`

, `foo(x)`

should compute f(1)+\frac{1}{f(2)+f(3)}. The simplified MWE below works well in the absence of recursion, but otherwise it starts allocating. As ideally `foo()`

needs to run several hundreds of thousands of times, I would need it not to allocate any memory if posible. On a side note, I also do not understand the difference in performance and memory allocations between the first and second `@btime foo($x1)`

call.

```
struct myStruct1{T}
x::T
end
struct myStruct2{T}
x::T
end
@inline f(x::myStruct1, a) = a/x.x*2
@inline f(x::myStruct2, a) = a*x.x/2
@inline function foo(x)
c = Ref(0.0)
ntuple(Val(length(x))) do i
Base.@_inline_meta
if !(x[i] isa Tuple)
c[]+= f(x[i], rand())
else
c[]+= 1/(foo(x[i]))
end
end
c[]
end
a = myStruct1(1)
b = myStruct2(1)
x1 = (a, ((b, a), b))
x2 = (a, b, a, b)
@btime foo($x1) # 1st call 33.710 ns (3 allocations: 96 bytes)
@btime foo($x1) # 2nd call 17.940 ns (1 allocation: 16 bytes)
@btime foo($x2) # 8.373 ns (0 allocations: 0 bytes)
```

A warm welcome to the community.

I think the problem you do encounter is that

```
ntuple(Val(length(x))) do i
Base.@_inline_meta
if !(x[i] isa Tuple)
c[]+= f(x[i], rand())
else
c[]+= 1/(foo(x[i]))
end
end
```

creates a tuple, which is not used, but as it seems not optimized away by the compiler.

You can get what you want by using a different kind of meta-programming magic.

```
bar(x) = bar(x,Val(length(x)))
@inline @generated function bar(x,::Val{N}) where N
quote
c = 0.0
Base.Cartesian.@nexprs $N i -> c += x[i] isa Tuple ? 1/(bar(x[i],Val(length(x[i])))) : f(x[i],rand())
return c
end
end
```

yields

```
a = myStruct1(1)
b = myStruct2(1)
x1 = ((b, (a, b)), (((a, (a,(b,b))), a), b))
@btime bar($x1) # 36.172 ns (0 allocations: 0 bytes)
@btime foo($x1) # 154.228 ns (12 allocations: 432 bytes)
```

On your side note. If you evaluate `foo(x1)`

once, prior to the benchmark, both benchmarks yield the same result. Actually, this should not happen with `BenchmarkTools`

. I have no idea why it does.

1 Like

Alternatively, recursive calls can also work well when iterating over heterogeneous tuples like yours:

```
@inline goo(x::Tuple) = _goo(0.0, x...)
@inline _goo(c, xi::Tuple, args...) = _goo(c + 1 / goo(xi), args...)
@inline _goo(c, xi, args...) = _goo(c + f(xi, rand()), args...)
@inline _goo(c) = c # this is the final result
```

In my tests this is as efficient as @moeddel’s nice solution using `@generated`

functions:

```
julia> x1 = ((b, (a, b)), (((a, (a,(b,b))), a), b))
julia> @btime foo($x1)
178.637 ns (12 allocations: 432 bytes)
julia> @btime bar($x1)
37.526 ns (0 allocations: 0 bytes)
julia> @btime goo($x1)
36.720 ns (0 allocations: 0 bytes)
```

2 Likes