Alllocations in a reduction over a Tuple

I’m trying to make a recursive reduction over a Tuple that calculates the inverse of the reduction of a nested Tuple everytime it encounters one. I.e. for x=(1,(2,3)), foo(x) should compute f(1)+\frac{1}{f(2)+f(3)}. The simplified MWE below works well in the absence of recursion, but otherwise it starts allocating. As ideally foo() needs to run several hundreds of thousands of times, I would need it not to allocate any memory if posible. On a side note, I also do not understand the difference in performance and memory allocations between the first and second @btime foo($x1) call. struct myStruct1{T} x::T end struct myStruct2{T} x::T end @inline f(x::myStruct1, a) = a/x.x*2 @inline f(x::myStruct2, a) = a*x.x/2 @inline function foo(x) c = Ref(0.0) ntuple(Val(length(x))) do i Base.@_inline_meta if !(x[i] isa Tuple) c[]+= f(x[i], rand()) else c[]+= 1/(foo(x[i])) end end c[] end a = myStruct1(1) b = myStruct2(1) x1 = (a, ((b, a), b)) x2 = (a, b, a, b) @btime foo($x1) # 1st call 33.710 ns (3 allocations: 96 bytes)
@btime foo($x1) # 2nd call 17.940 ns (1 allocation: 16 bytes) @btime foo($x2) # 8.373 ns (0 allocations: 0 bytes)


A warm welcome to the community.

I think the problem you do encounter is that

ntuple(Val(length(x))) do i
Base.@_inline_meta
if !(x[i] isa Tuple)
c[]+= f(x[i], rand())
else
c[]+= 1/(foo(x[i]))
end
end


creates a tuple, which is not used, but as it seems not optimized away by the compiler.

You can get what you want by using a different kind of meta-programming magic.

bar(x) = bar(x,Val(length(x)))

@inline @generated function bar(x,::Val{N}) where N
quote
c = 0.0
Base.Cartesian.@nexprs $N i -> c += x[i] isa Tuple ? 1/(bar(x[i],Val(length(x[i])))) : f(x[i],rand()) return c end end  yields a = myStruct1(1) b = myStruct2(1) x1 = ((b, (a, b)), (((a, (a,(b,b))), a), b)) @btime bar($x1) # 36.172 ns (0 allocations: 0 bytes)
@btime foo($x1) # 154.228 ns (12 allocations: 432 bytes)  On your side note. If you evaluate foo(x1) once, prior to the benchmark, both benchmarks yield the same result. Actually, this should not happen with BenchmarkTools. I have no idea why it does. 1 Like Alternatively, recursive calls can also work well when iterating over heterogeneous tuples like yours: @inline goo(x::Tuple) = _goo(0.0, x...) @inline _goo(c, xi::Tuple, args...) = _goo(c + 1 / goo(xi), args...) @inline _goo(c, xi, args...) = _goo(c + f(xi, rand()), args...) @inline _goo(c) = c # this is the final result  In my tests this is as efficient as @moeddel’s nice solution using @generated functions: julia> x1 = ((b, (a, b)), (((a, (a,(b,b))), a), b)) julia> @btime foo($x1)
178.637 ns (12 allocations: 432 bytes)

julia> @btime bar($x1) 37.526 ns (0 allocations: 0 bytes) julia> @btime goo($x1)
36.720 ns (0 allocations: 0 bytes)

2 Likes