Priority queue choice

I need a performant implementation of a PriorityQueue{Int,Float64}. For Int you can think pixels and for Float64 think random times.

Is this PriorityQueue from DataStructures.jl the right thing or are there maybe other choices? I have that the integer is always from 1:n and that at any time n objects are in the queue: an object leaving the queue gets inserted again with new priority the old priority t plus some random increment, t + randexp(), until convergence of the algorithm.

1 Like

MWE

using Random
using DataStructures

function f(n, T)
    Q = PriorityQueue{Int,Float64}()
    x = zeros(Int, n)
    t = 0.0
    for i in eachindex(x)
        enqueue!(Q, i => t + randexp())
    end
    while t < T
        i, t = dequeue_pair!(Q)
        x[i] += 1
        enqueue!(Q, i => t + randexp())
        Q[mod1(i-1, n)] = t + 0.5randexp()
        Q[mod1(i+1, n)] = t + 0.5randexp()
    end
    x
end

f(10, 100.0) # [158, 182, 172, 169, 158, 160, 163, 153, 157, 156]

I will shamelessly suggest my package TrackingHeaps.jl. If you can use a Heap (i.e., at a given time you only need the topmost priority element), and your keys are Ints, and you may need the extra tracking features it uses, then it is good.

I can give you support if you need (the package is documented and has tests). It is not a popular package however (I never announced it, and people using it probably just stumbled randomly on it).

5 Likes

Nice.

using Random
using DataStructures

function f(n, T)
    Q = PriorityQueue{Int,Float64}()
    x = zeros(Int, n)
    t = 0.0
    for i in eachindex(x)
        enqueue!(Q, i => t + 10randexp())
    end
    while t < T
        i, t = peek(Q)
        x[i] += 1
        Q[i] = t + 10randexp()
        Q[mod1(i-1, n)] = t + 10randexp()
        Q[mod1(i+1, n)] = t + 10randexp()
    end
    x
end


using TrackingHeaps
using Distributions

function g(n, T, Q = TrackingHeap(Int, S=NoTrainingWheels()))
    x = zeros(Int, n)
    t = 0
    for i in eachindex(x)
        Q[i] = t + rand(Geometric(0.1))
    end
    while t < T
        i, t = TrackingHeaps.top(Q)
        x[i] += 1
        Q[i] = t + rand(Geometric(0.1))
        Q[mod1(i-1, n)] = t + rand(Geometric(0.1))
        Q[mod1(i+1, n)] = t + rand(Geometric(0.1))
    end
    x
end

julia> @time f(10000, 10000.0)
  6.715294 seconds (40 allocations: 955.469 KiB)

julia> @time g(10000, 10000)
  2.279287 seconds (51 allocations: 848.500 KiB)
2 Likes

My fricking god, you were faster than me, the person that actually wrote the thing, XD.

using Random
using DataStructures
using BenchmarkTools
using TrackingHeaps

function f(n, T)
    Random.seed!(0)
    Q = PriorityQueue{Int,Float64}()
    x = zeros(Int, n)
    t = 0.0
    for i in eachindex(x)
        enqueue!(Q, i => t + randexp())
    end
    while t < T
        # Updated here to be fair.
        i, t = peek(Q)
        x[i] += 1
        Q[i] = t + randexp()
        Q[mod1(i-1, n)] = t + 0.5randexp()
        Q[mod1(i+1, n)] = t + 0.5randexp()
    end
    x
end

function f2(n, T)
    Random.seed!(0)
		Q = TrackingHeap(Float64)
		sizehint!(Q, n)
    x = zeros(Int, n)
    t = 0.0
    for i in eachindex(x)
        # for a fresh TrackingHeap track! will already use the keys you want
        track!(Q, t + randexp())
    end
    while t < T
        # The TrackingHeaps.{top,update!} are necessary only because
				# DataStructures was imported too, and both define such function.
        # Here it is not efficient to remove and then insert again,
				# let us just peek and update.
        i, t = TrackingHeaps.top(Q)
        x[i] += 1
        TrackingHeaps.update!(Q, i => t + randexp())
				# The two `setindex!` below could be changed to `update!`s too.
        Q[mod1(i-1, n)] = t + 0.5randexp()
        Q[mod1(i+1, n)] = t + 0.5randexp()
    end
    x
end

@show @btime f(10, 100.0);
# 395.686 ÎĽs (12 allocations: 1.45 KiB)
@show @btime f2(10, 100.0);
# 318.332 ÎĽs (6776 allocations: 159.53 KiB)
@show @btime f(20, 200.0);
# 1.619 ms (16 allocations: 3.44 KiB)
@show @btime f2(20, 200.0);
# 1.239 ms (26710 allocations: 626.97 KiB)
@show @btime f(30, 300.0);
# 4.038 ms (16 allocations: 3.53 KiB)
@show @btime f2(30, 300.0);
# 2.745 ms (60152 allocations: 1.38 MiB)

I tried to be a little more fair to DataStructures.jl so my results are not so much better, XD (i.e., I did not use NoTrainingWheels like you). But I perceived some excessive allocation, I think my package can be improved. Maybe I will look a little on that in the weekend.

I think this is because your constructor is not type stable. So I use a kind of function barrier

function f2(n, T, Q = TrackingHeap(Float64))
    Random.seed!(0)    

but you perhaps also want to provide a stable constructor.

Ah, I thought it could be this, but I expected the compiler to work-around it, XD.

You can use TrackingHeap{Int, Float64, 2, MinHeapOrder, NoTrainingWheels}(Float64[], Int[], Int[]), but is a mouthful, and it exposes a little the internals (but there is basically no risk of them changing).

There are a bunch of other heaps in DataStructures.jl depending on your exact needs.
The cool MinMaxHeap is one of them.
There is also a PR waitng to be merge that will significantly speed up all heap performance

I need to work on a type-stable constructor that already receives the values (and optionally the keys) in heap order, so it builds the structure without basically no overhead.

Seeing that you are a contributor of DataStructures.jl what do you think of my package? (If you do not have the time do not worry.) It could be added to DataStructures.jl? Changes in DataStructures.jl will make it obsolete in the close future? It will not be made obsolete but is too complex/specific to be merged?

I think PriorityQueue is the right data structure, if only index could be replaced by a vector (instead of a dict.) I am spending two-thirds of time in https://github.com/JuliaCollections/DataStructures.jl/blob/6dea392834f9a3bffa3b8d17ab309db93c810c32/src/priorityqueue.jl#L133

Oh, yes. The limitation of my package of only using integers as keys is what allows it to use a vector instead of a Dict, and one of the things that pushed me into developing it.

How else differs your approach? You are not only replacing DataStructure’s index by a vector, don’t you?

I would have to know more about your library. Feel free to open an issue on DataStructures

How else differs your approach? You are not only replacing DataStructure’s index by a vector, don’t you?

I had a look at PriorityQueue code and this is not so distant of reality. The datastructure of both is simple (it is a heap after all), the main things that PriorityQueue does differently are:

  1. The index array is a Dict instead of a Vector as you pointed out (this has considerable implications for both performance and usability thought, it is not a light decision to make).
  2. The ordering is both a field in the data structure and a type parameter (not sure why? in my code is just a type parameter).
  3. This is already kind of behavior, but it should be noted PriorityQueue is always a binary heap (you cannot easily make it ternary, or any arbitrary N-arity, as in TrackingHeaps).

Conceptually, they are a little different too. PriorityQueue has a concept of priority and key (that do not need to have an ordering), you want to store keys but you must give them a priority (that has an ordering) that defines its position in the heap. I find this a little confusing (in many use cases the key may be nothing, you just want the priority). TrackingHeap has a concept of values and trackers, you want to store values (that have an ordering), TrackingHeap gives you and allows you to use trackers to interact with the stored values efficiently, but such trackers can be entirely ignored if not needed (they are a consequence, not a requirement).

What drive me to write TrackingHeaps was mostly providing optimized methods that I had no way to write (without breakage) using only the public API of MutableBinaryHeap. I say that in the package github front page:

  1. a non-top value can be deleted without being made top first;
  2. convenience methods allow to pop! / delete! stored values and immediatelly track! others, avoiding double-rebalancing;

I am not sure if I overlooked PriorityQueue (because it is not in the heap section, even being a heap). The option (1) does not make sense to PriorityQueue. The (2) has yet some reason to be, if in a deletion-plus-addition the tracker deleted cannot be reused for the addition (that tracker/key is either made unusable for the rest of the algorithm or will made available again only in specific circumstances).

I probably need to update the comparison to refer to PriorityQueue.

My use case at the time was a very optimized backtrack algorithm that dealt with a heap in a very performance-critical and specific way (values were removed and others added in their place, but the old key could not be reused until the backtrack reached some specific points where all references pointing to that key did not exist anymore).

2 Likes

I have now “vendorized” the DataStructures implementation and replace the index with a vector https://github.com/mschauer/ZigZagBoomerang.jl/blob/master/src/priorityqueue.jl

Have a look

Can you make the data structure choice a type parameter?

Do you have any data about how the performance of this priority queue version compares to my package?

When I compared, yours was still faster, but same ball park. More importantly I only implemented a small subset of the full interface I think.

Thanks for the information.