Stable hashing across Julia versions

Did something change with respect to hashing since 1.4.2? I have code that checks whether data have already been computed by checking a hash of the arguments used to compute the data, and I appear to be getting different hash values in 1.5.0 compared to 1.4.2.

Perhaps related to this, since the change seems to related to hashing arrays.

In 1.4.2:

julia> hash([1,2,3])
0xecc5186e7be222c6

julia> hash.([1,2,3])
3-element Array{UInt64,1}:
 0x02011ce34bce797f
 0x5a94851fb48a6e05
 0x3688cf5d48899fa6

In 1.5.0:

julia> hash([1,2,3])
0xd22721a98cab7f9d

julia> hash.([1,2,3])
3-element Array{UInt64,1}:
 0x02011ce34bce797f
 0x5a94851fb48a6e05
 0x3688cf5d48899fa6

Hashing is not guaranteed to be stable across Julia versions.

2 Likes

That’s a really good point. I guess for my code I’ll just have to stick to 1.4.2 for now and think of a better approach in future versions.

1 Like

If the values are always arrays of ints or something comparable, then you could just use write to get a “canonical” binary representation and use CRC32 or SHA to hash that binary data. If the data is more complex, you could use BSON to serialize it and then hash the resulting BSON data.

3 Likes

You can also do crc32c(reinterpret(UInt8,a)) (for an array of bitstypes, using the crc32c function from the CRC32c standard library), rather than write. Both write an reinterpret will lead to an endianness-dependent result, of course (unless you use hton or similar first).

3 Likes

Thanks to both of you! Those are interesting solutions and I’ll probably try out both and see what works. This actually also helps me with a related question I posed here a while back, that is how to make a hash that works both for julia and Python.

Because I had a lot of data already stored using hash values from julia-1.4, I decided to pull out the hashing code from julia base, preserving its state at v1.4.2, and put it into a separate repository StableHashes. From here I export the function shash which basically does (mostly) whatever the hash function did in 1.4.2. Since this code was copied directly from the julia repository, please let me know if that is in any way problematic. I’ve kept the MIT license from julia, obviously, and I’ve kept all comments about the origin of the code.

3 Likes