DataType hash differs per patch version and OS

I noticed hash(Float32) returns different value on each julia version.

While other hash functions return same hash for same value, e.g.

julia> hash("a")
0xfa232f94411b00cd

is same for Julia versions 1.3.1, 1.5.0, 1.5.1, 1.5.2.
But for hash(Float32) it’s different for each version, it’s returning different things, these are values are observed.

julia> hash(Float32)
0x7bb4922dfd8d6f8a
julia> hash(Float32)
0x2ee851e9579d5f99
julia> hash(Float32)
0x35768f5f3a96aad1
julia> hash(Float32)
0xd237962f65c4d373
julia> hash(Float32)
0x80b1a44c1d020e80

is this intended, or a bug?
And if it’s intended, what is the reason why this happens?

And also, why is it different for windows and linux?

This isn’t a bug. Hashing had changed over time for some types to become faster.

1 Like

I see, that makes sense. But why is it different per OS?

Oh. I missed that part of it. That might be a bug.

A hash is not a checksum. If you require that kind of stability across versions and OS, try using SHA and the functions provided by that Base package.

If those functions differ between OS, that’s definitely a bug.

The reaseon hash(Float32) differs greatly is because of the way generic hashing is implemented. It falls back to hashing an internal identifier of the object, which can change with each version, different OS and even build from the source code:

In hash(x) at hashing.jl:18
>18  hash(x::Any) = hash(x, zero(UInt))

About to run: (hash)(Float32, 0x0000000000000000)
1|debug> s
In hash(x, h) at hashing.jl:23
>23  hash(@nospecialize(x), h::UInt) = hash_uint(3h - objectid(x))

"""
    objectid(x)

Get a hash value for `x` based on object identity. `objectid(x)==objectid(y)` if `x === y`.
"""
objectid(@nospecialize(x)) = ccall(:jl_object_id, UInt, (Any,), x)
1 Like

This is not a reason for the instability. jl_object_id IS stable for many other types. The hash for datatype changes between build because it was designed to be so (it includes the build time of the module). It’s not a bug but could be made better to reduce build variations but there’s at least currently no guarantee on returning the same value on different builds.

1 Like

Oh, so different hash for datatype for windows and linux is because of different build times?

Hashes for Types in general are just object IDs which are essentially arbitrary.

They are not any more arbitrary than other types of hash.

1 Like