[ANN] StaticDictTrees.jl: Fast dictionary trees with zero-allocation views

Hi everyone! :waving_hand:

I’d like to announce the StaticDictTrees.jl package. It provides a high-performance dictionary with Tuples as keys, and with the added functionality of providing views into any data subset identified by an incomplete key.

If you’ve ever used a Dict{Tuple, T} and wished you could instantly isolate all entries sharing a specific prefix without an expensive O(N) filter, this package is for you.

An example is worth a million words:

using StaticDictTrees

#  Create a static dict tree
metrics = SDTree((:prod, :web, :cpu) => 42.5,
                 (:prod, :web, :ram) => 78.1,
                 (:prod, :db,  :cpu) => 12.0,
                 (:stage, :web, :cpu) => 8.4)

# Standard full-key lookup
metrics[:prod, :web, :cpu] # 42.5

# Take a zero-allocation, type-stable view of a sub-branch
prod_web = view(metrics, (:prod, :web))

# The view behaves exactly like a relative AbstractDict ...
prod_web[:ram] # 78.1

# ... and any modification in a view is reflected in the original tree
prod_web[:ram] = 99.
metrics[:prod, :web, :ram] # 99.

Comments are welcome!

I understand you don’t like it, no probem!! :wink:

Deletion is feasible, it works, its availability do not harm the performance of other operations (lookup, insert and updates), and most importantly it is clearly mentioned that it is by far the slowest operation.
In short: it can be useful to delete just one entry rather than recreating a huge structure from scratch.
So why refusing?

I was not aware of Base._unsetindex!, thanks for te suggestion!
But I want to use the Array exposed interface, rather than relying on internal machinery.

This would be in contrast with the need to always have all the values stored in a contiguous vector.

Well, there possibly is a misunderstanding here: this is not a 5-minute chat with Gemini being copy/pasted and published, it’s something it took me a lot of time to conceive and implement despite the help from AI, and I share it in the hope it is useful to someone else.

Also, I don’t think the code is sloppy at all (but I am surely biased…).

But I understand your concerns, and I am curious to know whether there is a clear guideline on registering / advertising packages whose code has been developed with AI help (besides pointing it out).

So this alleviates the need to always have all the values stored in a contiguous vector?

Not sure I follow…
The purpose is to have all values stored in a contiguous vector, and this is what the package does, i.e.:

using StaticDictTrees

#  Create a static dict tree
metrics = SDTree((:prod, :web, :cpu) => 42.5,
                 (:prod, :web, :ram) => 78.1,
                 (:prod, :db,  :cpu) => 12.0,
                 (:stage, :web, :cpu) => 8.4)
values(metrics)

will print

4-element Vector{Float64}:
 42.5
 78.1
 12.0
  8.4

regardless them being in separate branch.

Let me know if this answer your question

But I want to use the Array exposed interface, rather than relying on internal machinery.

You shouldn’t be afraid of using the internal machinery here. Everybody uses something like Base.unset_index! for this job. Without that, it would not be possible to write things like Dict. If you must avoid unexported APIs (e.g. due to policy), then you can ccall – that was exported in 1.0 and therefore will remain valid.

I apologize, I misunderstood the nature of your package. In that case, keep on sharing and having fun! And sorry for me being a spoilsport :frowning:
(…but I think the sharing should be for educational purposes, not for “use this as dependency”)

It’s not the code, it’s the underlying algorithm / datastructure that looks sloppy.

You can also do deletion by moving: If the key/value at index idx needs to go, then do

oldkey = sdtree.keys[idx]
oldlen = length(sdtree.keys)
movedkey = sdtree.keys[oldlen]
sdtree.keys[idx] = movedkey
sdtree.values[idx] = sdtree.values[oldlen]
pop!(sdtree.keys)
pop!(sdtree.values)

sdtree.lookup[movedkey] = idx
delete!(sdtree.lookup, oldkey)

#do the same for branch_lookup

If possible I would avoid it, exactly because I prefer to be on the safe side and write something I entirely understand.

But this would destroy the order on the internal values vector and force me to modify values(::SDTree) to return an iterator rather than a vector.

This was one of my requirements from the beginning and I don’t know if it is worth to trade this to have better performance on delete! and prune!…

Anyway, thanks for raising the point, I would definitely think about it.

Quick update:

  • I modified the delete and prune algorithms to follow @foobar_lv2 suggestion (“deletion by moving”). This destroys the order on the internal values vector but the perfomance improvements are huge!;
  • The SDTree and SDBranch inherit from AbstractDict hence it is fine for values() to return an iterator;
  • I also implemented the values_view() method to return a view on the internal vector with correct order (namely the order elements were inserted in the tree).

New version is 0.1.1.

destroys the order on the internal values vector but the perfomance improvements are huge. Does this preserve contiguity?

Yes, the internal values field is still dense, i.e. contiguous, but adjacent cells are not guaranteed to be in the same order as insertion.

The only ways to get them in insertion order is by:

  • iterating over values();
  • accessing via values_view().