# Priority queue choice

I need a performant implementation of a `PriorityQueue{Int,Float64}`. For `Int` you can think pixels and for Float64 think random times.

Is this `PriorityQueue` from DataStructures.jl the right thing or are there maybe other choices? I have that the integer is always from `1:n` and that at any time `n` objects are in the queue: an object leaving the queue gets inserted again with new priority the old priority `t` plus some random increment, `t + randexp()`, until convergence of the algorithm.

1 Like

MWE

``````using Random
using DataStructures

function f(n, T)
Q = PriorityQueue{Int,Float64}()
x = zeros(Int, n)
t = 0.0
for i in eachindex(x)
enqueue!(Q, i => t + randexp())
end
while t < T
i, t = dequeue_pair!(Q)
x[i] += 1
enqueue!(Q, i => t + randexp())
Q[mod1(i-1, n)] = t + 0.5randexp()
Q[mod1(i+1, n)] = t + 0.5randexp()
end
x
end

f(10, 100.0) # [158, 182, 172, 169, 158, 160, 163, 153, 157, 156]
``````

I will shamelessly suggest my package `TrackingHeaps.jl`. If you can use a Heap (i.e., at a given time you only need the topmost priority element), and your keys are Ints, and you may need the extra tracking features it uses, then it is good.

I can give you support if you need (the package is documented and has tests). It is not a popular package however (I never announced it, and people using it probably just stumbled randomly on it).

5 Likes

Nice.

``````using Random
using DataStructures

function f(n, T)
Q = PriorityQueue{Int,Float64}()
x = zeros(Int, n)
t = 0.0
for i in eachindex(x)
enqueue!(Q, i => t + 10randexp())
end
while t < T
i, t = peek(Q)
x[i] += 1
Q[i] = t + 10randexp()
Q[mod1(i-1, n)] = t + 10randexp()
Q[mod1(i+1, n)] = t + 10randexp()
end
x
end

using TrackingHeaps
using Distributions

function g(n, T, Q = TrackingHeap(Int, S=NoTrainingWheels()))
x = zeros(Int, n)
t = 0
for i in eachindex(x)
Q[i] = t + rand(Geometric(0.1))
end
while t < T
i, t = TrackingHeaps.top(Q)
x[i] += 1
Q[i] = t + rand(Geometric(0.1))
Q[mod1(i-1, n)] = t + rand(Geometric(0.1))
Q[mod1(i+1, n)] = t + rand(Geometric(0.1))
end
x
end

julia> @time f(10000, 10000.0)
6.715294 seconds (40 allocations: 955.469 KiB)

julia> @time g(10000, 10000)
2.279287 seconds (51 allocations: 848.500 KiB)
``````
2 Likes

My fricking god, you were faster than me, the person that actually wrote the thing, XD.

``````using Random
using DataStructures
using BenchmarkTools
using TrackingHeaps

function f(n, T)
Random.seed!(0)
Q = PriorityQueue{Int,Float64}()
x = zeros(Int, n)
t = 0.0
for i in eachindex(x)
enqueue!(Q, i => t + randexp())
end
while t < T
# Updated here to be fair.
i, t = peek(Q)
x[i] += 1
Q[i] = t + randexp()
Q[mod1(i-1, n)] = t + 0.5randexp()
Q[mod1(i+1, n)] = t + 0.5randexp()
end
x
end

function f2(n, T)
Random.seed!(0)
Q = TrackingHeap(Float64)
sizehint!(Q, n)
x = zeros(Int, n)
t = 0.0
for i in eachindex(x)
# for a fresh TrackingHeap track! will already use the keys you want
track!(Q, t + randexp())
end
while t < T
# The TrackingHeaps.{top,update!} are necessary only because
# DataStructures was imported too, and both define such function.
# Here it is not efficient to remove and then insert again,
# let us just peek and update.
i, t = TrackingHeaps.top(Q)
x[i] += 1
TrackingHeaps.update!(Q, i => t + randexp())
# The two `setindex!` below could be changed to `update!`s too.
Q[mod1(i-1, n)] = t + 0.5randexp()
Q[mod1(i+1, n)] = t + 0.5randexp()
end
x
end

@show @btime f(10, 100.0);
# 395.686 ÎĽs (12 allocations: 1.45 KiB)
@show @btime f2(10, 100.0);
# 318.332 ÎĽs (6776 allocations: 159.53 KiB)
@show @btime f(20, 200.0);
# 1.619 ms (16 allocations: 3.44 KiB)
@show @btime f2(20, 200.0);
# 1.239 ms (26710 allocations: 626.97 KiB)
@show @btime f(30, 300.0);
# 4.038 ms (16 allocations: 3.53 KiB)
@show @btime f2(30, 300.0);
# 2.745 ms (60152 allocations: 1.38 MiB)
``````

I tried to be a little more fair to `DataStructures.jl` so my results are not so much better, XD (i.e., I did not use `NoTrainingWheels` like you). But I perceived some excessive allocation, I think my package can be improved. Maybe I will look a little on that in the weekend.

I think this is because your constructor is not type stable. So I use a kind of function barrier

``````function f2(n, T, Q = TrackingHeap(Float64))
Random.seed!(0)
``````

but you perhaps also want to provide a stable constructor.

Ah, I thought it could be this, but I expected the compiler to work-around it, XD.

You can use `TrackingHeap{Int, Float64, 2, MinHeapOrder, NoTrainingWheels}(Float64[], Int[], Int[])`, but is a mouthful, and it exposes a little the internals (but there is basically no risk of them changing).

There are a bunch of other heaps in DataStructures.jl depending on your exact needs.
The cool `MinMaxHeap` is one of them.
There is also a PR waitng to be merge that will significantly speed up all heap performance

I need to work on a type-stable constructor that already receives the values (and optionally the keys) in heap order, so it builds the structure without basically no overhead.

Seeing that you are a contributor of `DataStructures.jl` what do you think of my package? (If you do not have the time do not worry.) It could be added to `DataStructures.jl`? Changes in `DataStructures.jl` will make it obsolete in the close future? It will not be made obsolete but is too complex/specific to be merged?

I think `PriorityQueue` is the right data structure, if only `index` could be replaced by a vector (instead of a dict.) I am spending two-thirds of time in https://github.com/JuliaCollections/DataStructures.jl/blob/6dea392834f9a3bffa3b8d17ab309db93c810c32/src/priorityqueue.jl#L133

Oh, yes. The limitation of my package of only using integers as keys is what allows it to use a vector instead of a Dict, and one of the things that pushed me into developing it.

How else differs your approach? You are not only replacing DataStructureâ€™s `index` by a vector, donâ€™t you?

I would have to know more about your library. Feel free to open an issue on DataStructures

How else differs your approach? You are not only replacing DataStructureâ€™s `index` by a vector, donâ€™t you?

I had a look at `PriorityQueue` code and this is not so distant of reality. The datastructure of both is simple (it is a heap after all), the main things that `PriorityQueue` does differently are:

1. The `index` array is a `Dict` instead of a `Vector` as you pointed out (this has considerable implications for both performance and usability thought, it is not a light decision to make).
2. The ordering is both a field in the data structure and a type parameter (not sure why? in my code is just a type parameter).
3. This is already kind of behavior, but it should be noted `PriorityQueue` is always a binary heap (you cannot easily make it ternary, or any arbitrary N-arity, as in `TrackingHeaps`).

Conceptually, they are a little different too. `PriorityQueue` has a concept of `priority` and `key` (that do not need to have an ordering), you want to store `key`s but you must give them a `priority` (that has an ordering) that defines its position in the heap. I find this a little confusing (in many use cases the `key` may be `nothing`, you just want the `priority`). `TrackingHeap` has a concept of `values` and `trackers`, you want to store `values` (that have an ordering), `TrackingHeap` gives you and allows you to use `trackers` to interact with the stored values efficiently, but such `trackers` can be entirely ignored if not needed (they are a consequence, not a requirement).

What drive me to write `TrackingHeaps` was mostly providing optimized methods that I had no way to write (without breakage) using only the public API of `MutableBinaryHeap`. I say that in the package github front page:

1. a non-top value can be deleted without being made top first;
2. convenience methods allow to `pop!` / `delete!` stored values and immediatelly `track!` others, avoiding double-rebalancing;

I am not sure if I overlooked `PriorityQueue` (because it is not in the heap section, even being a heap). The option (1) does not make sense to `PriorityQueue`. The (2) has yet some reason to be, if in a deletion-plus-addition the tracker deleted cannot be reused for the addition (that tracker/key is either made unusable for the rest of the algorithm or will made available again only in specific circumstances).

I probably need to update the comparison to refer to `PriorityQueue`.

My use case at the time was a very optimized backtrack algorithm that dealt with a heap in a very performance-critical and specific way (values were removed and others added in their place, but the old key could not be reused until the backtrack reached some specific points where all references pointing to that key did not exist anymore).

2 Likes

I have now â€śvendorizedâ€ť the DataStructures implementation and replace the index with a vector https://github.com/mschauer/ZigZagBoomerang.jl/blob/master/src/priorityqueue.jl

Have a look

Can you make the data structure choice a type parameter?

Do you have any data about how the performance of this priority queue version compares to my package?

When I compared, yours was still faster, but same ball park. More importantly I only implemented a small subset of the full interface I think.

Thanks for the information.