I have a very simple package I am developing, say defined like this:
module MyPkg
struct A
a::Int
end
end # module
Now in a Julia session, I import this package, and compute a hash of an object:
julia> using MyPkg # first-usage, precompiles
julia> hash(A(5)) # returns a hash
Next, I do some modification of the package source code,
module MyPkg
struct A
a::Int
end
f() = 2 # modification of package source code
end # module
So next time I use this package, a precompilation is triggered again:
julia> using MyPkg # precompiles again
julia> hash(A(5)) # returns a different hash than before!
Even though the type A has not changed, for some reason I now get a different hash for A(5). Two questions:
Why is this happening? Why the hash is not the same in the second usage?
How can I get a “deterministic” hash that doesn’t change whenever the package is pre-compiled?
Note that here I am doing “minor” modifications, in that things like the package identity (name, uuid) nor the type A definition have changed. So what is the hash depending on that is changing here?
AutoHashEquals needs to be used at the definition of the type. I meant I would like to have a generic hash that can be used for an outside type (probably defined without AutoHashEquals), and not have this instability.
Yes but that would have too many collisions. I mean satisfying the normal typical requirements of a hash function (as few collisions as possible), but still being consistent even if the package defining a type has to be precompiled (if the type definition itself does not change).
What you are expecting is a persistent hash function, which is not really what hash is meant for. Look to something like https://github.com/staticfloat/SHA.jl for supporting persistent hashing. As a core/builtin function, hash should have the goal of being very performant without excess functionality. I use the fact that it can produce different hashes in different Julia sessions to actually uncover bugs in code. Not my code, of course…
Thanks. Is https://github.com/staticfloat/SHA.jl the same as the stdlib SHA? Unfortunately it seems these functions only take certain kind of arguments (strings, Array{UInt8}, or IO objects).
Yeah, it might be in stdlibs now actually, been a while since I used it directly… You might need to create a method to dispatch for your types, or a generic method to iterate over the propertynames of types to build the SHA.
That would be an option, but note the performance trade-off — for many cases objectid is a good choice.
Generally I think it would make sense to have various predefined implementations of hashing (and of course equality, to be consistent), to which one can opt-in with traits.