In Metatheory.jl we have e-graphs, which are basically made of some Dicts
. We are encountering some performance bottlenecks because of the implementation of Dict
, and we realized that optimized data structures could go a lot faster.
for example we have
const Id = UInt64
struct EClass{D}
id::Id
nodes::Vector{Vector{Id}}
parents::Vector{Pair{Vector{Id},Id}}
data::Union{D,Nothing}
end
struct EGraph
classes::Dict{Id,EClass{Analysis}}
...
end
It should always be that egraph.classes[x].id == x
.
Since this data structure is basically accessed everywhere, hashing of UInt64
is becoming a bottleneck!!
There’s only 3 places where we perform operations that are not getindex
on classes
:
- Insertion of a new eclass. The
id
is fresh (incremented by 1) and is not replacing any old element in the dict. - Merging of 2 eclasses. this is why we are not using a vector!! because the length of the
classes
dict might be a lot less than the largestid
inserted. Given 2 ids,id_1
andid_2
, we remove the element for keyid_2
from theclasses
dict, and mutate the value inid_1
. - eclass.data is updated. a value for an
id
is replaced by a new one sinceEClass
is immutable. (can be made mutable).
We are trying to reduce the time required by getindex(g.classes, id::UInt64)
. Since we’re already accessing the dict via an UInt64, is there really the need for hashing?
The entirety of the algorithm does a LOT of accesses to the classes
Dict (and other Dicts) and we’re trying to kill the time (and allocations) required by accessing this Dict.
It can be as low level as possible. +1 if anyone knows something that avoids the garbage collector, since stuff should be freed when removed from the Dict.
I’ve implemented the trick described in Dictionary with custom hash function - #6 by Oscar_Smith
"""
There's no need of computing hash for dictionaries where keys are UInt64.
Wrap them in an immutable struct that overrides `hash`.
TODO: this is rather hacky. We need a more performant dict implementation.
Trick from: https://discourse.julialang.org/t/dictionary-with-custom-hash-function/49168
"""
struct IdKey
val::Id
end
Base.hash(a::IdKey, h::UInt) = xor(a.val, h)
Base.:(==)(a::IdKey, b::IdKey) = a.val == b.val
Since that discussion links to [WIP/RFC] broaden the scope of `hash` by rfourquet · Pull Request #37964 · JuliaLang/julia · GitHub - a POC of custom hash types in dictionaries, and introducing an extra struct is rather tricky, are you aware of any other approach that solves this issue? Basically all of our hashes are already cached before dictionary accesses.
We just need the fastest possible UInt64 → UInt64 dictionary ever.