Implementing a Trie

I am currently trying to implement a (bitwise?) Trie mapping Ints to some type V, and want to branch k bits at a time (resulting in 2^k children per node).
For a first attempt I went with k = 1 and my nodes looked like this:

mutable struct Node2{V}
  value::V
  left::Union{Nothing,Node2{V}}
  right::Union{Nothing,Node2{V}}
end

This works… ok.
When inserting a node I have to do something along the lines of

if isnothing(N.left)
  N.left = Node2{V}(...)
  ...

and the yellow type annotations in Cthulhu are not very reassuring, although I’m not sure how big of a problem this actually is? Is there any efficient way to assure the compiler that e.g. N.left is/is not a node depending on a bit field?

But then I have the general trie with 2^k children in each node and… how do I do that?
Using a NTuple{2^k, Union{Nothing, Node{k, V}} for all children looks horribly inefficient, both in code and in Cthulhu.
Are there any other options in Julia or is this better implemented in C or by unsafe_store! etc.?

and the yellow type annotations in Cthulhu are not very reassuring, although I’m not sure how big of a problem this actually is?

Yellow means that there is an abstract type in your type inference but it’s considered a “small union” so it will be compiled to efficient instructions. Union{Nothing, X} is the most common practical example of that. Abstract types that are going to lead to dynamic dispatch are red instead.

4 Likes

Thank you @jules !! I remember reading about optimisations for small unions, but the yellow colour looked very warning to me.

Unfortunately that leaves me with the issue of referencing the child nodes.
My initial idea of using an NTuple doesn’t work well as Setfield.jl doesn’t appear to be type-stable (in this case?) and all alternatives run into similar issues.

A StaticArrays MVector of nodes isn’t mutable because they’re non-isbitstype,
a MVector of Ptrs doesn’t work because unsafe_pointer_to_objref isn’t type-stable.
So that leaves me with a Vector, which sounds like a horrible idea but might be worth trying, or deferring to C? :frowning:

Not sure what the full performance implications of this are, but you can leave mutable struct fields uninitialized. Because they store only pointers anyway, I think those are null pointers under the hood and then incur additional null pointer checks (not sure how expensive those are).

julia> mutable struct Node3{V}
         value::V
         left::Node3{V}
         right::Node3{V}
         function Node3(value)
           new{typeof(value)}(value)
         end
       end

julia> Node3(1)
Node3{Int64}(1, #undef, #undef)

julia> Node3(1).left
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getproperty(x::Node3{Int64}, f::Symbol)
   @ Base ./Base.jl:37
 [2] top-level scope
   @ REPL[24]:1

julia> @code_warntype test(Node3(1))
MethodInstance for test(::Node3{Int64})
  from test(node) @ Main REPL[17]:1
Arguments
  #self#::Core.Const(test)
  node::Node3{Int64}
Body::Node3{Int64}
1 ─ %1 = Base.getproperty(node, :left)::Node3{Int64}
└──      return %1

Another alternative you can try, SumTypes.jl

julia> using SumTypes

julia> @sum_type Optional{T} begin
           None
           Some{T}(::T)
       end

julia> full_type(Optional{}) # check how the full type parameterization looks
Optional{Any, 0, 1, UInt8}

julia> mutable struct Node4{V}
         value::V
         left::Optional{Node4{V}, 0, 1, UInt8}
         right::Optional{Node4{V}, 0, 1, UInt8}
       end

julia> function test(node)
           node.left
       end
test (generic function with 1 method)

julia> code_warntype(test, Tuple{Node4{Int}})
MethodInstance for test(::Node4{Int64})
  from test(node) @ Main REPL[17]:1
Arguments
  #self#::Core.Const(test)
  node::Node4{Int64}
Body::Optional{Node4{Int64}, 0, 1, UInt8}
1 ─ %1 = Base.getproperty(node, :left)::Optional{Node4{Int64}, 0, 1, UInt8}
└──      return %1

Thank you again! Those should work - although, is Optional{T} really any different from Union{Nothing, T}? SumTypes are confusing to me in that regard.
And Union{Nothing, T} is indeed what I went with at first,
but ultimately I want to generalise this to a higher number of children (which I should probably turn into a type parameter → Node{N, V} for N children), that’s where the NTuples etc. came from.

The difference is that Optional{Node4{V}, 0, 1, UInt8} is a concrete type compared to Union{Nothing,T}. So at least @code_warntype will look different :wink:

The real benefit of SumTypes is that it forces you to always handle all possible options in the union. This is done by destructuring it using the @cases macro and it will fail at parse time if you haven’t handled all options. Also, if you have multiple possible types that are all isbits, then SumTypes can compute the smallest space needed to store any of them (plus type tag) and the result is again going to be isbits, so contiguously storable in arrays etc. This is not going to happen with Unions in general. If the union is sufficiently small (the compiler decides) you should however get similar speeds in calling the correct function on the value, because the compiler will not dynamically dispatch (with the generic and slow Julia mechanism) but just prepare for calling any of these few possible functions it could determine beforehand.

I tend to anthropomorphize things so I imagine the dynamic dispatch case like:

somfunc(value::Number)

Hey there’s this box with a value here, it could be any Number. Let’s unwrap it and check what type it has… Oh it’s an Int. What was that method of somefunc again that I could call on Ints? Ah there it is. Ok let’s call it.

vs. the small Union

somefunc(value::Union{Int,Float32})

Hey there’s this thing that could be either an Int or a Float32. I’ve got two somefunc methods at the ready for it. It’s an Int, go option 1!

1 Like