Efficient recursive iteration over self referencing struct

Suppose i have a Node type that may have a reference to itself on one of its fields. What is the most efficient way to implement Base.iterate for this Node type, such that at every iteration we get the next child Node?

julia> const Maybe{T} = Union{T, Nothing}
Union{Nothing, T} where T

julia> struct Node{T}
           name::String
           value::T
           child::Maybe{Node}
       end

julia> Base.iterate(::Node) = ??

Thanks in advance!

1 Like

I was trying to overengineer the solution for this problem, and it turns out it is more simpler than i thought, but iā€™m still no sure if thereā€™s a more efficient way, however, this works for now:

julia> function Base.length(node::Node)
           len = 0
           while !isnothing(node)
               len += 1
               node = node.child
           end
           return len
       end

julia> Base.iterate(node::Node, state::Node = node) = state, state.child

julia> Base.iterate(node::Node, ::Nothing) = nothing

julia> node = Node("x", 1, Node("y", 2, Node("z", 3)))
Node{Int64}("x", 1, Node{Int64}("y", 2, Node{Int64}("z", 3, nothing)))

julia> length(node)
3

julia> collect(node)
3-element Vector{Any}:
 Node{Int64}("x", 1, Node{Int64}("y", 2, Node{Int64}("z", 3, nothing)))
 Node{Int64}("y", 2, Node{Int64}("z", 3, nothing))
 Node{Int64}("z", 3, nothing)

When you say ā€œmore efficientā€, what exactly are you imagining the inefficiency to be? There are some ways you can make a linked list more efficient cache-wise, but they require quite a few changes. Just accessing the struct member and checking whether itā€™s nothing is probably as good as you can get with just this approach.

1 Like

By ā€œmore efficientā€, i mean fast, a non-naive solution. Iā€™m not good at writing efficient (fast) code. For most of the time in the past i didnā€™t care much about effiency and optimization, as long as my code was working it was fine, but now i want to be better at write efficient code. So if anybody has a better solution for the problem that i described explaining why is better it would be of much help.

(Sorry for my english: i actually speak portuguese)

I see! Well, the inefficiency of linked lists is inherent to the datastructure. What youā€™re asking about is how you can make a linked list faster in and of itself, and the ā€œproperā€ answer to that would be ā€œdonā€™t use a linked listā€ (assuming you want to make reading faster). Another complication is that linked lists truly are better for some workloads, but this depends on how you want to use them. There is no One-size-fits-all perfect datastructure.

Are you familiar with Big-O notation?

1 Like

I do not remember seeing a fast linked list implementation, I should write a package with one, but I believe I do not have the time for now.

The fastest way I know to implement a linked list is with two Vector, it works very well if the list starts populated and mostly suffer removals, and/or the insertions are few and distributed among the whole list. However, if you need to insert a lot of elements between other elements already present you either do your way or maybe go for OrderedCollections.jl.

The two-Vector way consists in a data Vector{T} which will keep the elements (no initialization is necessary) and a next Vector{Int} which points to the next position from the current position. If you want to start with everything populated you just create a data Vector with the elements in order, and 1:length(data) as the next. It can be circular (the last next can be one) or not (use 0 or other sentinel value in the last next position). If you think you will add elements, you can make it double the size, initialize only the odd data positions, and have next be [3, 0, 5, 0, 7, ...]. So if you can put new elements between each pair of current elements and future shifts will the the smallest possible. Is this understandable? This is just a very bare-bones explanation.

1 Like

Interesting, iā€™m know about Data Structures, but iā€™m still noob when it comes to choose which one is better for the job.

Yes, iā€™m familiar.

I think i understand what you mean. One day i watched a video about implementing a GUI library in Rust using Data-Oriented Programming, so instead of creating a wigdet structure with references to its children and parents, there was another structure that behaves as a ā€˜managerā€™ which stored each widget on a Vector and the child of each widget as indices on another Vector. At the time i did not understanded correctly, but now it starting to make sense. Thanks for your answer.

Not my best work, but supports sparsity, circularity (given one or more starting elements), and iteration. May implement removal and insertion after.

module FastLinkedLists

struct SingleLL{T, N}
	data :: Vector{T}
	next :: Vector{N}
end

"""
    function SingleLL(data, ::Type{N} = Int; kwargs...) where {T, N}

Create a linked list with unidirectional links (forward). NOTE: starting
with an empty circular buffer is no supported.
"""
function SingleLL(
	data :: Vector{T}, ::Type{N} = Int;
	copy_data = true, sparsity_factor :: Int = 1,
	circular = false
) where {T, N}
	if sparsity_factor < 1
		throw(ArgumentError("sparsity_factor should be one or more"))
	end
	if !copy_data && sparsity_factor > 1
		throw(ArgumentError(
			"You cannot disable copy_data and have sparsity_factor greater than one" *
			", use the default constructor for this kind of fiddling."
		))
	end
	if circular && isempty(data)
		throw(ArgumentError(
			"You cannot start with an empty circular buffer, the circularity is " *
			"engraved into the elements, not the list itself."
		))
	end
	sparsity_factor = convert(N, sparsity_factor)

	isempty(data) && return SingleLL{T, N}(copy_data ? copy(data) : data, N[])

	if isone(sparsity_factor)
		data = copy_data ? copy(data) : data
		next = collect(convert(N, 2):convert(N, length(data)))
		next[end] = circular ? one(N) : zero(N)
		return SingleLL{T, N}(data, next)
	end

	n = length(data)
	m = n * sparsity_factor
	data_ = Vector{T}(undef, m)
	next = zeros(N, m)
	j :: Int, i :: Int = 1, 1
	while i <= n
		data_[j] = data[i]
		next[j] = j + sparsity_factor
		j = next[j]
		i += 1
	end
	next[j - sparsity_factor] = circular ? one(n) : zero(n)

	return SingleLL{T, N}(data_, next)
end

import Base.iterate
function Base.iterate(l :: SingleLL{T, N}) where {T, N}
	isempty(l.next) && return nothing
	i = 1
	while iszero(l.next[i]); i += 1; end
	return (l.data[i], l.next[i])
end

function Base.iterate(l :: SingleLL{T, N}, i) where {T, N}
	iszero(i) && return nothing
	return (l.data[i], l.next[i])
end

end # module
1 Like

Thanks for your answer, this works for me too!

Is the output of collect correct? I was expecting to see a list of (String, Integer) pairs.

Well, the iteration returns the current Node, but if you want to return only the name and value, you do something like:

julia> Base.iterate(node::Node, state::Node = node) = state.name => state.value, state.child

julia> collect(node)
3-element Vector{Any}:
 "x" => 1
 "y" => 2
 "z" => 3

For my specific use case, i realized that having an Array where every next Node is a child of the previous Node is much better than having a linked list, because my situation is much more simpler, since every Node can only have one and only one child or no child.