Advanced iteration interface in Julia

ettersi · September 23, 2021, 7:07am

I’m working on a function to simplify iterating over binary trees. Ideally, this function should have the following signature:

walk_down(
    tree, starting_vertex; 
    two_children,  # Function vertex -> Union{Val(:left), Val(:right), Val(:break)}
    one_child,  # Function vertex -> Union{Val(:continue), Val(:break)}
    leaf  # Function vertex -> nothing
)

The values returned by the callback functions indicate in which direction the traversal should continue, if at all.

Making the above work is not difficult, but this will be a very low-level function and so I want to ensure that it is optimally efficient. I therefore cooked up the following simple toy problem to beta-test the interface:

# Alternative to `for vi in v`, mimicking the above interface
function foreach(f,v)
    for vi in v
        if f(vi) == Val(:break)
            break
        end
    end
end

# Benchmarking code
function foo(n)
    s = 0
    foreach(1:n) do i
        s += i
        if i == n÷2
            return Val(:break)
        else
            return Val(:continue)
        end
    end
    return s
end
function foo_ref(n)
    s = 0
    for i in 1:n
        s += i
        if i == n÷2
            break
        end
    end
    return s
end

It turns out that the proposed interface incurs a ~44x slowdown:

julia> @btime foo($n)
       @btime foo_ref($n)
  11.275 μs (470 allocations: 7.34 KiB)
  248.822 ns (0 allocations: 0 bytes)

Any suggestions on how to avoid this slowdown?

I’m aware that I can get much more fine-grained control by turning foreach() into a macro, but that also means more work and probably a more brittle abstraction, so ideally I’d like to avoid that.

Sukera · September 23, 2021, 8:03am

Do you know about AbstractTrees.jl?

In case that’s not helpful/you don’t want to use it - the foreach you’re using is slow because it is type unstable and you’re constructing Val instances at runtime, which leads to a very bad kind of type instability. You’ll need dynamic dispatch everywhere - why not simply return true or false, making it type stable?

ettersi · September 23, 2021, 8:37am

I’m aware of AbstractTrees.jl. As far as I am aware, that package does not provide an interface for iterating over only part of a tree (e.g. a single branch from the root), so it doesn’t help for my purposes.

I’m also aware of the type instability, but it turns out this instability is fully handled by Julia’s union-splitting optimisation. Rewriting foreach() so it operates on Bool instead of Union{Val(:break), Val(:continue)} leads to exactly the same runtime and number of allocations:

function foreach_Bool(f,v)
    for vi in v
        if f(vi)
            break
        end
    end
end

function foo_Bool(n)
    s = 0
    foreach_Bool(1:n) do i
        s += i
        if i == n÷2
            return true
        else
            return false
        end
    end
    return s
end

julia> @btime foo_Bool(1000)
  11.023 μs (470 allocations: 7.34 KiB)

My gut feeling is that the performance penalty comes from the fact that the f passed to foreach(f,v) is a closure over s and hence s is boxed and moved to the heap. I was hoping that the compiler would inline both foreach and f so that s can be unboxed again, but looking at @code_native foo(1000), it seems that this is not the case.

ettersi · September 23, 2021, 9:14am

Ah, the issue that I’m facing is performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub. Making s a Ref{Int} seems to fully solve the problem:

function foo_Ref(n)
    s = Ref(0)
    foreach(1:n) do i
        s[] += i
        if i == n÷2
            return Val(:break)
        else
            return Val(:continue)
        end
    end
    return s[]
end

julia> @btime foo_Ref(1000)
  257.358 ns (0 allocations: 0 bytes)

I don’t think I entirely understand why this works, though. Any reference to why Ref() jells well with closures would be appreciated.

aplavin · September 23, 2021, 9:21am

According to an answer to my earlier performance question:

, the issue is that s += 1 assigns a new value to the same variable, and this isn’t treated well by the compiler.

ettersi · September 23, 2021, 9:39am

Which begs the question, why not? To answer this question myself (or at least trying to, after reading a bit more about 15276): it seems that the Julia compiler has to decide whether or not to box s before type inference is run, thus if s is assigned it must defensively assume that it may also change type and hence it must be boxed.

Topic		Replies	Views
New iterate interface and type stability General Usage	3	574	May 29, 2018
Help with binary trees benchmark games example General Usage benchmarks	53	3660	May 6, 2021
Little parallel speedup on binary tree structure in Julia 1.3.0-rc1 Performance	0	688	September 10, 2019
New iteration protocol much slower General Usage	12	2031	November 24, 2018
[ANN] Traceur.jl: Make Your Julia Code Not Totally Slow Community	20	6742	March 5, 2020

Advanced iteration interface in Julia

Related topics