I’m trying to make a recursive reduction over a Tuple
that calculates the inverse of the reduction of a nested Tuple
everytime it encounters one. I.e. for x=(1,(2,3))
, foo(x)
should compute f(1)+\frac{1}{f(2)+f(3)}. The simplified MWE below works well in the absence of recursion, but otherwise it starts allocating. As ideally foo()
needs to run several hundreds of thousands of times, I would need it not to allocate any memory if posible. On a side note, I also do not understand the difference in performance and memory allocations between the first and second @btime foo($x1)
call.
struct myStruct1{T}
x::T
end
struct myStruct2{T}
x::T
end
@inline f(x::myStruct1, a) = a/x.x*2
@inline f(x::myStruct2, a) = a*x.x/2
@inline function foo(x)
c = Ref(0.0)
ntuple(Val(length(x))) do i
Base.@_inline_meta
if !(x[i] isa Tuple)
c[]+= f(x[i], rand())
else
c[]+= 1/(foo(x[i]))
end
end
c[]
end
a = myStruct1(1)
b = myStruct2(1)
x1 = (a, ((b, a), b))
x2 = (a, b, a, b)
@btime foo($x1) # 1st call 33.710 ns (3 allocations: 96 bytes)
@btime foo($x1) # 2nd call 17.940 ns (1 allocation: 16 bytes)
@btime foo($x2) # 8.373 ns (0 allocations: 0 bytes)
A warm welcome to the community.
I think the problem you do encounter is that
ntuple(Val(length(x))) do i
Base.@_inline_meta
if !(x[i] isa Tuple)
c[]+= f(x[i], rand())
else
c[]+= 1/(foo(x[i]))
end
end
creates a tuple, which is not used, but as it seems not optimized away by the compiler.
You can get what you want by using a different kind of meta-programming magic.
bar(x) = bar(x,Val(length(x)))
@inline @generated function bar(x,::Val{N}) where N
quote
c = 0.0
Base.Cartesian.@nexprs $N i -> c += x[i] isa Tuple ? 1/(bar(x[i],Val(length(x[i])))) : f(x[i],rand())
return c
end
end
yields
a = myStruct1(1)
b = myStruct2(1)
x1 = ((b, (a, b)), (((a, (a,(b,b))), a), b))
@btime bar($x1) # 36.172 ns (0 allocations: 0 bytes)
@btime foo($x1) # 154.228 ns (12 allocations: 432 bytes)
On your side note. If you evaluate foo(x1)
once, prior to the benchmark, both benchmarks yield the same result. Actually, this should not happen with BenchmarkTools
. I have no idea why it does.
1 Like
Alternatively, recursive calls can also work well when iterating over heterogeneous tuples like yours:
@inline goo(x::Tuple) = _goo(0.0, x...)
@inline _goo(c, xi::Tuple, args...) = _goo(c + 1 / goo(xi), args...)
@inline _goo(c, xi, args...) = _goo(c + f(xi, rand()), args...)
@inline _goo(c) = c # this is the final result
In my tests this is as efficient as @moeddel’s nice solution using @generated
functions:
julia> x1 = ((b, (a, b)), (((a, (a,(b,b))), a), b))
julia> @btime foo($x1)
178.637 ns (12 allocations: 432 bytes)
julia> @btime bar($x1)
37.526 ns (0 allocations: 0 bytes)
julia> @btime goo($x1)
36.720 ns (0 allocations: 0 bytes)
2 Likes