I am writing an immutable type in Julia and have realized while working on it that I have an opportunity to go a little crazy with the type parameters. Here’s a contrived example that deals with the same problem. I’m curious about advice for when this kind of use of parametric types is worth it and when it might be a bad idea. My own intuitions on the problem, for what they are worth, are below.
This question is a good resource but only a partial answer.
Problem. I want to write an efficient immutable type StringTrie <: AbstractSet{String} that implements a set of strings as a trie. A simple implementation might be like this:
struct TrieNode
    endsword::Bool
    children::Tuple
end
function TrieNode(s::String) 
    if length(s) == 0
        return TrieNode(true, ())
    else 
        return TrieNode(false, (s[1] => TrieNode(s[2:end]),))
    end
end
function _find(n::TrieNode, c::Char)
    for (ii,(k,v)) in enumerate(n.children)
        k == c && return ii
    end
    return 0
end
Base.in(s::String, n::TrieNode) = begin
    length(s) == 0 && return n.endsword
    (c,s) = (s[1], s[2:end])
    ii = _find(n, c)
    return ii == 0 ? false : in(s, n.children[ii][2])
end        
push(n::TrieNode, s::String) = begin
    if length(s) == 0
        return n.endsword ? n : TrieNode(true, n.children)
    else
        (c,s) = (s[1], s[2:end])
        ii = _find(n, c)
        if ii == 0
            return TrieNode(n.endsword, (n.children..., c => TrieNode(s)))
        else
            (k,v) = n.children[ii]
            new_v = push(v, s)
            new_v === v && return n
            ch = n.children
            new_ch = (ch[1:ii-1]..., k=>new_v, ch[ii+1:end]...)
            return TrieNode(n.endsword, new_ch)
        end
    end
end
struct StringTrie <: AbstractSet{String}
    n::Int
    root::Union{TrieNode,Nothing}
end
StringTrie() = StringTrie(0, nothing)
StringTrie(s::Vararg{String,N}) where {N} = reduce(push, s, init=StringTrie())
Base.length(t::StringTrie) = t.n
Base.in(s::String, t::StringTrie) = in(s, t.root)
push(t::StringTrie, s::String) = begin
    t.root === nothing && return StringTrie(1, TrieNode(s))
    new_root = push(t.root, s)
    return new_root === t.root ? t : StringTrie(t.n+1, new_root)
end
A simple test confirms this mostly works:
strings = ["foo", "bar", "baz", "qux", "qu", "ba", "foobar", ""]
st = StringTrie(strings...)
([s in st for s in strings], [reverse(s) in st for s in strings])
#[Out] := (Bool[1, 1, 1, 1, 1, 1, 1, 1], Bool[0, 0, 0, 0, 0, 0, 0, 1])
An alternative to this method might be to lean really further into the type system in order to give the compiler lots of information. For this, one could go with the following code.
struct TrieNode{T <: Tuple}
    endsword::Bool
    children::T
end
function TrieNode(endsword::Bool, children::T) where {T<:Tuple}
    return TrieNode{T}(endsword, children)
end
The rest of this implementation is actually almost identical to the original, but it gives the compiler all the type information it could possibly want. As a consequence, however, the typeof(trie.root) for even a small trie becomes something awful:
strings = ["foo", "bar", "baz", "qux", "qu", "ba", "foobar", ""]
st = StringTrie(strings...)
typeof(st.root)
# [Out] := TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{}}}}}}}}}}}}}}}}}},Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{}}},Pair{Char,TrieNode{Tuple{}}}}}}}}},Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{Pair{Char,TrieNode{Tuple{}}}}}}}}}}}
Note, however, that the API of StringTrie remains unharmed by this, so the problem is only potentially a performance one.
My understanding is that this particular implementation of TrieNode should have a completely flat profile of Chars and Bools in memory with none of the data hierarchy explicitly needed, which is a big plus. However, it’s unclear to me how much compile-time I’m sacrificing by going this route. A trie like this with thousands of entries seems untenable, instinctively.
I also suspect that with some clever technomancy one could come up with similar examples where the extent to which the code leans on the compiler is less than in the last case but more than in the first two. What’s the ideal to aim for here? Also, what is the best way to benchmark? (Does @benchmark include/ignore compile time? Can they be benchmarked separately?). If I have code that generates types everywhere, will the llvm eventually garbage collect them or will they build up indefinitely?
Thanks in advance!