why is using a custom hash function in a dict so slow?
using BenchmarkTools
import Base.hash,Base.isequal
rtuple() = (trunc(Int,rand()*10_000),trunc(Int,rand()*10_000)) #make random tuple in [0,10_000)
function Base.hash(t::Tuple{Int,Int})
a = t[1] % UInt64
b = t[2] % UInt64
return ifelse(t[1] < t[2], (b << 32) | a, (a << 32) | b) % UInt
end
function Base.isequal(a::Tuple{Int,Int},b::Tuple{Int,Int})
return min(a[1],a[2]) == min(b[1],b[2]) && max(a[1],a[2]) == max(b[1],b[2])
end
function addkeys(d,l)
for t in l
d[t] = 1
end
end
function bench()
l = [rtuple() for i in 1:100_000]
@btime hash.($l)
a = hash.(l)
display(length(unique(a))) #not many collisions
d = Dict{Tuple{Int,Int},Int}()
@btime addkeys($d,$l)
end
bench()
output:
37.160 μs (2 allocations: 781.33 KiB)
99906
3.634 s (0 allocations: 0 bytes))
with the default hash function
399.578 μs (2 allocations: 781.33 KiB)
99960
1.445 ms (0 allocations: 0 bytes)
So, you can see that this hash function is very fast (10x faster than the default hash), and does not produce more collisions. However, it is 100x slower than the default hash when used in a Dict.
I have tried defining a custom type, as stated in this discourse thread, but it does not change the results. I also don’t think that is needed here as hash
and isequal
do dispatch correctly to Tuple{Int,Int}
because Base defines a more general function.
Thanks
edit: I should mention that this is not my commutative algorithm, it comes from this stackoverflow post.