Custom optimized hash tables with UInt64 keys, lower level than Dict for ultra speed

0x0f0f0f · May 4, 2024, 1:19pm

In Metatheory.jl we have e-graphs, which are basically made of some Dicts. We are encountering some performance bottlenecks because of the implementation of Dict, and we realized that optimized data structures could go a lot faster.

for example we have

const Id = UInt64

struct EClass{D}
  id::Id
  nodes::Vector{Vector{Id}}
  parents::Vector{Pair{Vector{Id},Id}}
  data::Union{D,Nothing}
end

struct EGraph 
    classes::Dict{Id,EClass{Analysis}}
    ...
end

It should always be that egraph.classes[x].id == x.

Since this data structure is basically accessed everywhere, hashing of UInt64 is becoming a bottleneck!!

There’s only 3 places where we perform operations that are not getindex on classes:

Insertion of a new eclass. The id is fresh (incremented by 1) and is not replacing any old element in the dict.
Merging of 2 eclasses. this is why we are not using a vector!! because the length of the classes dict might be a lot less than the largest id inserted. Given 2 ids, id_1 and id_2, we remove the element for key id_2 from the classes dict, and mutate the value in id_1.
eclass.data is updated. a value for an id is replaced by a new one since EClass is immutable. (can be made mutable).

We are trying to reduce the time required by getindex(g.classes, id::UInt64). Since we’re already accessing the dict via an UInt64, is there really the need for hashing?
The entirety of the algorithm does a LOT of accesses to the classes Dict (and other Dicts) and we’re trying to kill the time (and allocations) required by accessing this Dict.

It can be as low level as possible. +1 if anyone knows something that avoids the garbage collector, since stuff should be freed when removed from the Dict.

I’ve implemented the trick described in Dictionary with custom hash function - #6 by Oscar_Smith

"""
There's no need of computing hash for dictionaries where keys are UInt64.
Wrap them in an immutable struct that overrides `hash`.

TODO: this is rather hacky. We need a more performant dict implementation.

Trick from: https://discourse.julialang.org/t/dictionary-with-custom-hash-function/49168
"""
struct IdKey
  val::Id
end
Base.hash(a::IdKey, h::UInt) = xor(a.val, h)
Base.:(==)(a::IdKey, b::IdKey) = a.val == b.val

Since that discussion links to [WIP/RFC] broaden the scope of `hash` by rfourquet · Pull Request #37964 · JuliaLang/julia · GitHub - a POC of custom hash types in dictionaries, and introducing an extra struct is rather tricky, are you aware of any other approach that solves this issue? Basically all of our hashes are already cached before dictionary accesses.

We just need the fastest possible UInt64 → UInt64 dictionary ever.

nsajko · May 4, 2024, 1:41pm

A MWE with profiler flamegraph would be nice.

Why UInt64? Why not a narrower type?

Have you considered using a concrete type instead of a Union? Maybe SumTypes.jl?

TBH that’s an issue with the design of your structs. Obviously Id should have been a type parameter instead of a global name. Better fix it sooner rather than later.

nhz2 · May 4, 2024, 1:49pm

I haven’t done any benchmarking, but maybe something like the data structure used in slotmap - Rust would have less overhead?

stevengj · May 4, 2024, 1:54pm

You can easily define a custom hash function that only does xor: Dictionary with custom hash function - #4 by stevengj

0x0f0f0f · May 4, 2024, 2:36pm

A MWE with profiler flamegraph would be nice.

On branch 3.0 Release by 0x0f0f0f · Pull Request #185 · JuliaSymbolics/Metatheory.jl · GitHub

using Metatheory, BenchmarkTools

t = @theory a b begin
  a + b --> b + a
  a * b --> b * a
  a + 0 --> a
  a * 0 --> 0
  a * 1 --> a
end

using BenchmarkTools

p = SaturationParams(; timer = false)

function simpl(ex)
  g = EGraph(ex)
  saturate!(g, t, p)
  extract!(g, astsize)
end

ex = :(0 + (1 * foo) * 0 + (a * 0) + a)

simpl(ex)

@profview simpl(ex)

Separating out the troublesome parts of the algorithm doesn’t really make sense as of now because it’s a quite complex system that involves compiled pattern matching and I’m benchmarking all the components together.

Why UInt64? Why not a narrower type?

It’s a trick to pack everything efficiently in Vector{UInt64}.

const Id = UInt64

"""
    const VecExpr = Vector{Id}

An e-node is a `Vector{Id}` where:
* Position 1 stores the hash of the `VecExpr`.
* Position 2 stores the bit flags (`isexpr` or `iscall`).
* Position 3 stores the signature
* Position 4 stores the hash of the `head` (if `isexpr`) or node value in the e-graph constants.
* The rest of the positions store the e-class ids of the children nodes.
"""
const VecExpr = Vector{Id}

This allowed for an extreme speedup compared to using structs. See this file This is also because all of these values also need to be pushed to and popped from a buffer hundreds of thousands (if not millions) of times. This by now is restricting the library to work only on systems where hashes are UInt64 but that’s not much of an issue, as everything is vectorized as bytes.

TBH that’s an issue with the design of your structs. Obviously Id should have been a type parameter instead of a global name. Better fix it sooner rather than later.

I see what you mean, but considering how VecExpr are used all around, and that the substantial speedup came by packing things in contiguous segments of memory, I’m not really sure if that is doable. It’s should be as low level, close to bytes and memory as possible.

0x0f0f0f · May 4, 2024, 2:40pm

Well, Metatheory.jl is basically an extended rewrite of GitHub - egraphs-good/egg: egg is a flexible, high-performance e-graph library from Rust in Julia, integrating with all our nice language features and Symbolics.jl / SymbolicUtils.jl compatible rewrite rule language and term interface (TermInterface.jl)

See the preprint for version 3.0 https://arxiv.org/pdf/2404.08751

We do have automatic benchmarks that run against the Rust implementation 3.0 Release by 0x0f0f0f · Pull Request #185 · JuliaSymbolics/Metatheory.jl · GitHub

You can see that we’re pretty close to Rust in performance, but some improvements can still be achieved.

Benchmark Results

	egg-sym	egg-cust	MT@572e2f625bf…	MT@master	egg-sym/MT@572…	egg-cust/MT@57…	MT@master/MT@5…
egraph_addexpr	1.46 ms		5.18 ms	16 ms	0.281		3.08
basic_maths_simpl2	14.1 ms	5.11 ms	22.2 ms	1.07e+03 ms	0.637	0.23	48.3
prop_logic_freges_theorem	2.67 ms	1.56 ms	2.5 ms	38.8 ms	1.07	0.624	15.5
calc_logic_demorgan	62.3 μs	34.4 μs	75.7 μs	474 μs	0.824	0.455	6.26
calc_logic_freges_theorem	6.28 ms	2.87 ms
basic_maths_simpl1	6.78 ms	2.81 ms	5.65 ms	57.5 ms	1.2	0.497	10.2
egraph_constructor	0.0882 μs		0.739 μs	0.599 μs	0.119		0.81
prop_logic_prove1	37.4 ms	14.3 ms	73.8 ms	8.88e+03 ms	0.508	0.193	120
prop_logic_demorgan	82.1 μs	44.9 μs	100 μs	821 μs	0.821	0.449	8.21
while_superinterpreter_while_10			17.1 ms	98 ms			5.74
prop_logic_rewrite			110 μs	123 μs			1.12
time_to_load			70.9 ms	101 ms			1.43

Benchmarks are here Metatheory.jl/benchmark/benchmarks.jl at ale/3.0 · JuliaSymbolics/Metatheory.jl · GitHub

Topic		Replies	Views
Dictionary with custom hash function General Usage question , dictionary , hash	7	2484	November 23, 2023
Performance issues when working with dict Performance dictionary	11	1663	November 16, 2022
Custom hash function in dict is very slow? Performance	3	892	April 20, 2021
Hashing for big structs is slow - any alternative? Performance hash	12	1251	January 25, 2021
Small <: AbstractDict implementation General Usage question	9	1104	February 13, 2019

Custom optimized hash tables with UInt64 keys, lower level than Dict for ultra speed

Benchmark Results

Related topics