Iterating over a Tree recursively with Base.iterate

I want to implement Base.iterate so that it can recursively iterate over a tree - it seems to me that gives the best performance.

The type is defined as follows:

mutable struct node
    next::Union{node,Missing}
    first_child::Union{node,Missing}
    last_child::Union{node,Missing}
end

next points to the next child of the same ancestor (basically a linked list)
first_child points to the first child of a node.

I would like to have Base.iterate implemented such that it returns the nodes in Pre-order (Neutral, Left, Right)

Quick example:

          A
      /   |   \
     B    D    E
    /        /   \
  C         F     G

Note: A through G are all of type node. B.next is D, because they are both children of A. However C.next is “missing”; B has no other children.

In a call like

for element in A
    element
end

I would like the output to be A - B - C - D - E - F - G

My Impressions/Attempts (you can skip this, if you already know a good answer)
I tried to implement Base.iterate(elem::node,state) with recursive calls to Base.iterate(elem::node,state) in its definition. This would either result in an endless loop or only return the first element.
I tried to use the Continuables package but got the same result.

A problem i suspect is, that the elements returned by the iterator must be returned as return (element,state)in a function with signature Base.iterate(elem::node, state). That implies however that the function ends at that point; however i would like to do something like:

return (elem,state)          #first, return the current node
#for all children of elem
    Base.iterate(child)      #then, return the children and their subtrees
end

It seems (to me) that that would be the correct way to enter the subtrees recursively; However the return statement puts an early stop to things (all that would be needed to remedy this situation would be a return statement, that yields the element, but doesnt stop the program.)

Question 1: For future reference: How can i implement Base.iterate in a recursive fashion like that? (This question is meant generally, not necessarily involving my specific node implementation here)

Question 2: What do you think would be the best/most performant way to implement an iterator for a tree-structure, that returns the elements in Pre-order?

Is there are particular reason why you would like to build the iterator yourself or would AbstractTrees.jl also suffice? It also contains iterators for trees.

1 Like

I suppose AbstractTrees might work, havent checked that out. I needed a lot of clarity of what exactly my code does, so i tried to avoid it. That could be a nice answer to Question 2. I am still interested in how one would make a recursive iterator though - will look into AbstractTrees source.

Edit: The project i am working on relies on an API where a tree is implied the way i showed in the first post. I fear using an implementation with a different structure for trees might mess things up.

I implemented a small program with AbstractTrees and the iteration is way worse using abstract trees. To collect the results of an iteration over a tree with 10K nodes a structure like above node took a bit over 4ms, a structure with abstract trees took 225ms.

Did i maybe misuse AbstractTrees, i wonder…

First if all, welcome to our community! :confetti_ball:

Some first comments:

  1. You want to use Nothing and nothing here. Missing is conceptually wrong.
  2. The default style is that the name of structs should be capitalized.
  3. What is the utility of last_child? It seems entirely unnecessary.

Answer 1: Base.iterate allows you to pass a state object that you can use however you want. You did not provide your implementation of Base.iterate. It seems obvious to me that it needs to contain a Vector of nodes representing a path from the root node to the current node, otherwise you have no way to backtrack, because you do not store the parent node at the node struct. For a struct similar to yours I would do:

mutable struct Node
    name::String
    next::Union{Node,Nothing}
    first_child::Union{Node,Nothing}
end

Base.iterate(c :: Node) = (c, Node[c])

function Base.iterate(_ :: Node, stack :: Vector{Node})
    cp = last(stack)
    if cp.first_child !== nothing
        return (cp.first_child, push!(stack, cp.first_child))
    else
        if cp.next !== nothing
            stack[end] = cp.next
            return (cp.next, stack)
        else # backtracks
            pop!(stack)
            while !isempty(stack)
                cp = last(stack).next
                if cp !== nothing
                    stack[end] = cp
                    return (cp, stack)
                end
                pop!(stack)
            end
            return nothing
        end
    end
end

data = Node("A",
    nothing,
    Node("B",
        Node("D",
            Node("E",
                nothing,
                Node("F",
                    Node("G", nothing, nothing),
                    nothing
                )
            ),
            nothing
        ),
        Node("C", nothing, nothing)
    )
)

for node in data
    print("$(node.name) ")
end
println()

# A B C D E F G

Answer 2: I am not sure if there is something generally better, it depends on the depth/breath your trees often take. If I would not use a third-party implementation, I would probably use a children :: Vector{Node} implementation.

struct MyNode
    children :: Vector{MyNode}
end

However, Base.iterate needs to pass around and modify both a MyNode stack and a Int stack (the current position in the children vector of each node of the path). This increases the cost traversing it, but improves the memory locality, as nodes with many children have them in contiguous memory. If you have a tree with low depth and with many leafs sharing the same parent nodes (i.e., the nodes in the penultimate level have a lot of children each) then you may see a improvement in the performance here.

2 Likes

Thank you for your answers!
regarding (1.), (2.), (3.): I didn’t know Missing was wrong here, but now i do. Excuse me for node Iwas hastily typing this down :stuck_out_tongue: . Node was just a minimal example imitating a struct of a larger project. last_child is part of that original struct, likely to implement backwards iteration.

Thank you for coding a Base.ierate function, the one I did looked similar. I suppose that Base.iterate can’t be implemented via ordinary recursion then (as in:)

yield_element_from_current_iteration()
Base.iterate(lect_child)
Base.iterate(right_child)
return

I had hoped it were possible to desgin Base.iterate like that.

Julia has a Base.yield but it does not work as the Lua counterpart (I only used the kind of yield you mentioned in Lua, not Python).

However, there is a @yield in ResumableFunctions.jl that may work exactly as you want (based on the yield from C#), and maybe makes much easier to code tree traversal, but I do not know which dark magic is used there and what is the impact in the performance.

iterate is a well defined interface - what happens when you write a for loop is this:

for item in iter   # or  "for item = iter"
    # body
end

is translated to

next = iterate(iter)
while next !== nothing
    (item, state) = next
    # body
    next = iterate(iter, state)
end

so you don’t call iterate again/recursively manually, it’s done for you.

Yes, but with a yield keyword (or macro as it seems the Julia case) it is possible to create a Base.iterate that only calls a @resumable function (which stores the inner state) in the fashion BALLsyman wanted.