Tail-call optimization/elimination was brought up in several threads like this one in discourse and this one in GitHub. The main interest seems to be purely based on the nice-to-have performance benefit and the core devs seem to be open to the change while assigning very low priority because writing for
/while
-loop is more idiomatic in Julia.
But I think there is another point of view related to function-barrier -based accumulation which makes me wonder if the tail-call optimization actually makes sense in Julia. By “function-barrier -based accumulation” I mean the pattern in Julia to invoke recursion in the middle of the loop when the type of an accumulation variable is changed or required to change. This pattern is used, e.g., in the implementation of collect
As you can see from the quoted example, this kind of recursion happens at the tail-call position because the whole purpose of this is to resume the loop while introducing the function-barrier.
Generalizing this idea, I was tempted to define a foldl
implementing the function-barrier -based type widening/refinement:
function foldlfb(op, init, iterator)
ret = iterate(iterator)
ret === nothing && return init
x, state = ret
return foldlfb_impl(op, op(init, x), iterator, state)
end
function foldlfb_impl(op, acc::T, iterator, state) where T
while (ret = iterate(iterator, state)) !== nothing
x, state = ret
y = op(acc, x)
y isa T || return foldlfb_impl(op, y, iterator, state)
acc = y
end
return acc
end
This works well for the case like collect
julia> foldlfb(vcat, Union{}[], [1.0, missing, nothing])
3-element Array{Union{Missing, Nothing, Float64},1}:
1.0
missing
nothing
(Of course, vcat
is not practically useful because it copies the array all the time. One would need something like store!_or_widen
by @Tamas_Papp to get a good performance.)
However, this foldlfb
works nicely only if the reducing function op
“monotonically” (in a loose sense) changes the output type. For example, vcat
always increase the eltype
and that’s why I described that it “monotonically” changes the type. However, not all functions have this property. A simple example would be:
julia> typechanging(::Any, x) = Val(x % 3)
typechanging (generic function with 1 method)
julia> foldlfb(typechanging, Val(0), 1:10)
Val{1}()
julia> foldlfb(typechanging, Val(0), 1:10^4)
ERROR: StackOverflowError:
So, this naive implementation of foldlfb
is not robust enough for practice usage.
Notice that tail-call optimization could have eliminated the StackOverflowError
. It would likely generate a non-type stable code which can be fine in many cases. If a user is passing type changing op
it’s likely that what s/he is expecting.
This led me to the idea that:
Tail-call optimization actually makes sense in Julia because the function-barrier is a common idiom to drop into a highly optimized computing kernel in the middle of dynamic code. So, why not making easy to drop into highly optimized loop during initially non-type stable loop?
Of course, this idea would be irrelevant if Julia can automatically optimize a loop
for x in iterator
acc = op(acc, x)
end
to implement the strategy that is manually done using function-barrier. If there are optimizations that can be done with tail-calls then maybe these are already doable in loops? At the same time, I wonder if the tail calls can be used as a way to communicate with the compiler that the programmer is writing a loop that may be type-stabilized at some point.
A very mechanical and naive transformation you can do to foldlfb_impl
is to add a counter to limit the tail calls
function foldlfb_impl(op, acc::T, iterator, state, limit::Val{n}=Val(10)) where {T, n}
while (ret = iterate(iterator, state)) !== nothing
x, state = ret
y = op(acc, x)
n === 0 || y isa T ||
return foldlfb_impl(op, y, iterator, state, Val(n-1))
acc = y
end
return acc
end
This lets you execute foldlfb(typechanging, Val(0), 1:10^4)
now. As the above transformation is very mechanical, I’d imagine that foldlfb_impl
defined using a tail call like the following can be transformed into the above version using the loop with limited recursion.
function foldlfb_impl(op, acc, iterator, state)
ret = iterate(iterator, state)
ret === nothing && return acc
x, state = ret
return foldlfb_impl(op, op(acc, x), iterator, state)
end
Does such optimization make sense? Would it be useful? Is it implementable in more general cases that the compiler has to handle? Are there any better strategies than simply limiting the recursions?