Moderator’s note: this post was a response split off from Moving BigFloat to stdlib? - #9
I have a feeling that this is a case of putting the cart before the horse.
That PR seems to be a complex way of trying to solve a problem that doesn’t really need to be solved.
A hash is calculated as a way of quickly finding something, to avoid possibly slower equality checks between two elements. Recently, the fact that it’s OK to let hashes collide, if calculating the hash itself is not fast, has been recognized by the Julia team in a PR about hash fallbacks.
Also, does there really need to be a single compatible hash
calculation that works across different types?
What if instead, there were a trait that would allow selecting a hash function for different types (combinations of types)?
It seems that there are really (at least) three types of hashes needed, one for scalar values, one for iterables, and one for ranges.
Hashes should only be used as a shortcut to avoid calling isequal
if the hashes are compatible (as determined by traits/promotion rules).
For example, if you have a collection of ranges, and their hashes were compatible, then you use those hashes.
If you have a collection with both ranges and vectors, then (maybe depending on the element types), the promotion rules might provide a compatible hash function (but the fallback might be something as simple as a compatible hash of the first and last elements). Depending on the types, the only compatible hash function might be the fallback, something like compat_hash(::Type{Any}, x) = 0
(not actually defined that way, it would need to have a signature with a trait instance)
Have you thought of how arbitrary numeric types (including DecFP, ArbFloats, etc.) could be hashed and compared in a compatible fashion without having to have a type that can hold all values of both without loss (something I don’t think may be possible)?
One way that I’ve found effective (and pretty efficient) is to convert to a string form (using something like the Grisu algorithm), that can be compared in a binary fashion. There are some tricks you can do, to make them compact and compare correctly (i.e. using radix 100, having negative numbers stored with the pairs of digits being stored with a bit to indicate the end, taking care of leading and trailing 0s. etc.).
For a Dict with heterogeneous numeric keys, you can convert all of the numbers, and cache the converted forms, and hash those as binary strings.
@jeff.bezanson What is your opinion of using traits / promotion rules / different hashing functions to handle the heterogeneous collection case, without resorting to widening?
Editted: fixed stupid switch of homo/heterogeneous (shouldn’t write so much without first cup of !)