I’ve come upon a performance gotcha with Dict
, wherein large amounts of insertions and deletions cause rehashing and allocations even when the load factor is stable.
I don’t know much about hash table implementations, so maybe this is a known weakness of the algorithm chosen. I did notice ndel
is only ever increased (in _delete!
), then used as a condition for rehashing (if more than 3/4 slots are deleted), and then reset to 0 on rehash. Does that mean deleted slots are never re-used? I thought commonly deleted slots are treated as occupied on probe, and empty for reuse on insert.
DataStructures.RobinDict
doesn’t exhibit the same behavior.
Code for validation:
using BenchmarkTools
using DataStructures
function foo(d, reps)
for i in 1:reps
d[i] = 1
delete!(d, i)
end
nothing
end
@btime foo($(Dict{Int, Int}()), 1000)
# 24.200 μs (249 allocations: 42.80 KiB)
@btime foo($(RobinDict{Int, Int}()), 1000)
# 33.101 μs (0 allocations: 0 bytes)