Did something change with respect to hashing since 1.4.2? I have code that checks whether data have already been computed by checking a hash of the arguments used to compute the data, and I appear to be getting different hash values in 1.5.0 compared to 1.4.2.
Perhaps related to this, since the change seems to related to hashing arrays.
In 1.4.2:
julia> hash([1,2,3])
0xecc5186e7be222c6
julia> hash.([1,2,3])
3-element Array{UInt64,1}:
0x02011ce34bce797f
0x5a94851fb48a6e05
0x3688cf5d48899fa6
In 1.5.0:
julia> hash([1,2,3])
0xd22721a98cab7f9d
julia> hash.([1,2,3])
3-element Array{UInt64,1}:
0x02011ce34bce797f
0x5a94851fb48a6e05
0x3688cf5d48899fa6
Hashing is not guaranteed to be stable across Julia versions.
That’s a really good point. I guess for my code I’ll just have to stick to 1.4.2 for now and think of a better approach in future versions.
If the values are always arrays of ints or something comparable, then you could just use write
to get a “canonical” binary representation and use CRC32 or SHA to hash that binary data. If the data is more complex, you could use BSON to serialize it and then hash the resulting BSON data.
You can also do crc32c(reinterpret(UInt8,a))
(for an array of bitstypes, using the crc32c
function from the CRC32c
standard library), rather than write
. Both write
an reinterpret
will lead to an endianness-dependent result, of course (unless you use hton
or similar first).
Thanks to both of you! Those are interesting solutions and I’ll probably try out both and see what works. This actually also helps me with a related question I posed here a while back, that is how to make a hash that works both for julia and Python.
Because I had a lot of data already stored using hash values from julia-1.4, I decided to pull out the hashing code from julia base, preserving its state at v1.4.2, and put it into a separate repository StableHashes. From here I export the function shash
which basically does (mostly) whatever the hash
function did in 1.4.2. Since this code was copied directly from the julia repository, please let me know if that is in any way problematic. I’ve kept the MIT license from julia, obviously, and I’ve kept all comments about the origin of the code.