Hashes change whenever package is pre-compiled

I have a very simple package I am developing, say defined like this:

module MyPkg
struct A
   a::Int
end
end # module

Now in a Julia session, I import this package, and compute a hash of an object:

julia> using MyPkg # first-usage, precompiles
julia> hash(A(5)) # returns a hash

Next, I do some modification of the package source code,

module MyPkg
struct A
   a::Int
end
f() = 2 # modification of package source code
end # module

So next time I use this package, a precompilation is triggered again:

julia> using MyPkg # precompiles again
julia> hash(A(5)) # returns a different hash than before!

Even though the type A has not changed, for some reason I now get a different hash for A(5). Two questions:

  1. Why is this happening? Why the hash is not the same in the second usage?
  2. How can I get a “deterministic” hash that doesn’t change whenever the package is pre-compiled?

Note that here I am doing “minor” modifications, in that things like the package identity (name, uuid) nor the type A definition have changed. So what is the hash depending on that is changing here?

The fallback implementation of hash uses objectid, which is very fast but can change from sessions to session.

The solution is writing your own hash function.

Is it possible to have a generic hash function, that doesn’t have this instability?

https://github.com/andrewcooke/AutoHashEquals.jl

2 Likes

Thanks, I think I understand what the problem is. However it seems that the hash should not change if the identity of the object doesn’t change?

AutoHashEquals needs to be used at the definition of the type. I meant I would like to have a generic hash that can be used for an outside type (probably defined without AutoHashEquals), and not have this instability.

Not sure what you mean here. myhash(x) = UInt64(1) would satisfy your criteria.

Yes but that would have too many collisions. I mean satisfying the normal typical requirements of a hash function (as few collisions as possible), but still being consistent even if the package defining a type has to be precompiled (if the type definition itself does not change).

What you are expecting is a persistent hash function, which is not really what hash is meant for. Look to something like https://github.com/staticfloat/SHA.jl for supporting persistent hashing. As a core/builtin function, hash should have the goal of being very performant without excess functionality. I use the fact that it can produce different hashes in different Julia sessions to actually uncover bugs in code. Not my code, of course… :smirk:

1 Like

Thanks. Is https://github.com/staticfloat/SHA.jl the same as the stdlib SHA? Unfortunately it seems these functions only take certain kind of arguments (strings, Array{UInt8}, or IO objects).

Yeah, it might be in stdlibs now actually, been a while since I used it directly… You might need to create a method to dispatch for your types, or a generic method to iterate over the propertynames of types to build the SHA.

That would be an option, but note the performance trade-off — for many cases objectid is a good choice.

Generally I think it would make sense to have various predefined implementations of hashing (and of course equality, to be consistent), to which one can opt-in with traits.

Could also be implemented in separate packages. Something like SHA mentioned above, if it handled generic types.