Deterministic tie breaking with a hash function, with desired statistical property

goretkin · June 23, 2021, 11:35pm

Background:
I am using a priority queue to search paths on an integer lattice. In general, there is not a unique minimum element in the priority queue, so I augment the priorities with a value that is designed to break ties deterministically. This ensures that the search order does not depend on how the priority queue itself breaks ties. Setting different seeds for this hash also lets me get different paths, corresponding to different arbitrary choices made during the search algorithm.

For example, in the priority queue I may have two paths with the same priority:

2 => [(1, 1), (1, 2)]
2 => [(1, 1), (2, 1)]

and to break ties, I do

(2, 0xcafecafe) => [(1, 1), (1, 2)]
(2, 0xbeefbeef) => [(1, 1), (2, 1)]

where 0xcafecafe is derived from [(1, 1), (1, 2)] and some seed, and 0xbeefbeef is derived, with the same computation, from [(1, 1), (2, 1)] and the same seed.

Situation:
I have been using hash to generate these tie-breaker values, however I notice that it seems to be not ideal. In the first place, hash is allowed to change between versions of Julia. That doesn’t bother me too much.

I would like these numbers to be closer to 0.5:

julia> f(n) = count(x -> hash((1, 2), x) > hash((2, 1), x), rand(UInt64, n)) / n
f (generic function with 1 method)

julia> f(1_000_000)
0.434676

julia> f(1_000_000)
0.434022

julia> f(1_000_000)
0.434066

julia> f(1_000_000_000)
0.434203099

I would like to avoid using a RNG with “external” state. It would be nice to preserve ties between paths that are actually equal.

Any recommendations or keywords?

goretkin · June 24, 2021, 5:46am

Seems like just applying the hash function again gets me what I want:

julia> f(n, v1, v2) = count(x -> hash(hash(v1, x)) > hash(hash(v2, x)), rand(UInt64, n)) / n
f (generic function with 2 methods)

julia> f(1_000_000, (1, 2), (2, 1))
0.500247

julia> f(1_000_000, (1, 2), (2, 1))
0.500344

julia> f(1_000_000, (1, 2), (2, 1))
0.499908

Sukera · June 24, 2021, 6:11am

I’d give each value inserted into the priority queue an “index” and try to find out where a new inserted value would be inserted into the priority queue. If the next (or previous) value is the same (==, not ===), take its “index”, increment it by one and use that as its priority. So basically have

(2, 1) => [(1, 1), (1, 2)]
(2, 2) => [(1, 1), (2, 1)]

and so on, depending on when either was inserted.

Another option would be to query the priority queue for all equivalently important values and just pick one at random (if your priority queue supports that operation).

Topic		Replies	Views
A question on hashes General Usage	3	1581	December 14, 2017
I want to use a PriorityQueue equivalent with duplicate keys General Usage question , data_structures	4	335	December 14, 2023
Faster updates in a priority queue Performance	4	547	January 14, 2021
Persistent hash of julia object General Usage	12	3092	January 20, 2017
A priority queue with a vector of priorities General Usage	1	667	May 29, 2019

Deterministic tie breaking with a hash function, with desired statistical property

Related topics