I’m working on a function to simplify iterating over binary trees. Ideally, this function should have the following signature:
walk_down(
tree, starting_vertex;
two_children, # Function vertex -> Union{Val(:left), Val(:right), Val(:break)}
one_child, # Function vertex -> Union{Val(:continue), Val(:break)}
leaf # Function vertex -> nothing
)
The values returned by the callback functions indicate in which direction the traversal should continue, if at all.
Making the above work is not difficult, but this will be a very low-level function and so I want to ensure that it is optimally efficient. I therefore cooked up the following simple toy problem to beta-test the interface:
# Alternative to `for vi in v`, mimicking the above interface
function foreach(f,v)
for vi in v
if f(vi) == Val(:break)
break
end
end
end
# Benchmarking code
function foo(n)
s = 0
foreach(1:n) do i
s += i
if i == n÷2
return Val(:break)
else
return Val(:continue)
end
end
return s
end
function foo_ref(n)
s = 0
for i in 1:n
s += i
if i == n÷2
break
end
end
return s
end
It turns out that the proposed interface incurs a ~44x slowdown:
I’m aware that I can get much more fine-grained control by turning foreach() into a macro, but that also means more work and probably a more brittle abstraction, so ideally I’d like to avoid that.
In case that’s not helpful/you don’t want to use it - the foreach you’re using is slow because it is type unstable and you’re constructing Val instances at runtime, which leads to a very bad kind of type instability. You’ll need dynamic dispatch everywhere - why not simply return true or false, making it type stable?
I’m aware of AbstractTrees.jl. As far as I am aware, that package does not provide an interface for iterating over only part of a tree (e.g. a single branch from the root), so it doesn’t help for my purposes.
I’m also aware of the type instability, but it turns out this instability is fully handled by Julia’s union-splitting optimisation. Rewriting foreach() so it operates on Bool instead of Union{Val(:break), Val(:continue)} leads to exactly the same runtime and number of allocations:
function foreach_Bool(f,v)
for vi in v
if f(vi)
break
end
end
end
function foo_Bool(n)
s = 0
foreach_Bool(1:n) do i
s += i
if i == n÷2
return true
else
return false
end
end
return s
end
julia> @btime foo_Bool(1000)
11.023 μs (470 allocations: 7.34 KiB)
My gut feeling is that the performance penalty comes from the fact that the f passed to foreach(f,v) is a closure over s and hence s is boxed and moved to the heap. I was hoping that the compiler would inline both foreach and f so that s can be unboxed again, but looking at @code_native foo(1000), it seems that this is not the case.
function foo_Ref(n)
s = Ref(0)
foreach(1:n) do i
s[] += i
if i == n÷2
return Val(:break)
else
return Val(:continue)
end
end
return s[]
end
julia> @btime foo_Ref(1000)
257.358 ns (0 allocations: 0 bytes)
I don’t think I entirely understand why this works, though. Any reference to why Ref() jells well with closures would be appreciated.
Which begs the question, why not? To answer this question myself (or at least trying to, after reading a bit more about 15276): it seems that the Julia compiler has to decide whether or not to box sbefore type inference is run, thus if s is assigned it must defensively assume that it may also change type and hence it must be boxed.