Custom hash function in dict is very slow?

I have tried defining a custom type, as stated in this discourse thread, but it does not change the results. I also don’t think that is needed here as hash and isequal do dispatch correctly to Tuple{Int,Int} because Base defines a more general function.

The issue is not that dispatch is wrong; the issue is that some other unsuspecting code may test (a,b) == (c, d) for some tuple of integers. Your redefinition of Base.isequal now changes the behavior of this code. This is called type piracy.

… collisions …

The issue is that julia hash has a different contract than C++ hash: In C++, the hash function is supposed to minimize collisions, and nothing else. So e.g. hash(x::Int)::Int = x would be a reasonable hash in C++. This means that you cannot simply truncate the hash provided by an unknown user-provided type, you need to apply a random function to this hash (like e.g. a hash function). Imo this is bloody awful API design.

In julia, the contract of hash is that it should (1) minimize collisions, and (2) be reasonably random.

This is important, because Dict addresses its hashtable by just truncating to the lower bits of your user-provided hash-value (how many bits are taken depends on the size of the hashtable). Since your hashtable is pretty small, only the lower bits of min(t[1], t[2]) will be used, and max(t[1], t[2]) will not go into the actual adddressing.

1 Like