Deterministic tie breaking with a hash function, with desired statistical property

Background:
I am using a priority queue to search paths on an integer lattice. In general, there is not a unique minimum element in the priority queue, so I augment the priorities with a value that is designed to break ties deterministically. This ensures that the search order does not depend on how the priority queue itself breaks ties. Setting different seeds for this hash also lets me get different paths, corresponding to different arbitrary choices made during the search algorithm.

For example, in the priority queue I may have two paths with the same priority:

2 => [(1, 1), (1, 2)]
2 => [(1, 1), (2, 1)]

and to break ties, I do

(2, 0xcafecafe) => [(1, 1), (1, 2)]
(2, 0xbeefbeef) => [(1, 1), (2, 1)]

where 0xcafecafe is derived from [(1, 1), (1, 2)] and some seed, and 0xbeefbeef is derived, with the same computation, from [(1, 1), (2, 1)] and the same seed.

Situation:
I have been using hash to generate these tie-breaker values, however I notice that it seems to be not ideal. In the first place, hash is allowed to change between versions of Julia. That doesn’t bother me too much.

I would like these numbers to be closer to 0.5:

julia> f(n) = count(x -> hash((1, 2), x) > hash((2, 1), x), rand(UInt64, n)) / n
f (generic function with 1 method)

julia> f(1_000_000)
0.434676

julia> f(1_000_000)
0.434022

julia> f(1_000_000)
0.434066

julia> f(1_000_000_000)
0.434203099

I would like to avoid using a RNG with “external” state. It would be nice to preserve ties between paths that are actually equal.

Any recommendations or keywords?

1 Like

Seems like just applying the hash function again gets me what I want:

julia> f(n, v1, v2) = count(x -> hash(hash(v1, x)) > hash(hash(v2, x)), rand(UInt64, n)) / n
f (generic function with 2 methods)

julia> f(1_000_000, (1, 2), (2, 1))
0.500247

julia> f(1_000_000, (1, 2), (2, 1))
0.500344

julia> f(1_000_000, (1, 2), (2, 1))
0.499908

I’d give each value inserted into the priority queue an “index” and try to find out where a new inserted value would be inserted into the priority queue. If the next (or previous) value is the same (==, not ===), take its “index”, increment it by one and use that as its priority. So basically have

(2, 1) => [(1, 1), (1, 2)]
(2, 2) => [(1, 1), (2, 1)]

and so on, depending on when either was inserted.

Another option would be to query the priority queue for all equivalently important values and just pick one at random (if your priority queue supports that operation).