Fast(er) priority queues?

Hi there!
Recently I wanted to accelerate Dijkstra’s algorithm, and I ended up implementing a really dumb priority queue based on two sorted vectors. Surprisingly, in some settings, it significantly speeds up the algorithm. Would something like that deserve a place in DataStructures.jl?

See here for the rationale and there for the benchmark.

Ping @oxinabox @mschauer @mbesancon

2 Likes

It would be worth benchmarking this in 1.9 (Dict is about 30% faster for a lot of operations), and with SwissDict and RobinDict from DataStructures.

1 Like

Just a notice, the link in the page of benchmarkings is broken. Is it the same as this β€œPriority Queues and Dijkstra’s Algorithm”?

(BTW, nice package. :slightly_smiling_face: )

I have to point out that I have a package with a faster priority queue for years (as it can be seen in Priority queue choice - #3 by Henrique_Becker), and it did not make into DataStructures.jl. In general, I do not find very hard to outperform the data structures from DataStructures.jl.

3 Likes

Yes it’s the same, but it is not broken on my browser :person_shrugging:

Yes indeed, I cite your Discourse discussion with Moritz in my package docs :upside_down_face: Maybe I should include a more explicit reference to your package itself, sorry about that. Until this morning, the distinction between priority queue and heap was not very clear in my mind tbh :sweat_smile:

Did you try to contribute to DataStructures.jl though?

Great, I’ll take a look! My implementation is really dumb btw, and I can probably do better, at this point it is really more of a POC.
In fact, I’m not even sure that it deserves to go into DataStructures.jl: maybe the performance gap tells us more about Dijkstra than it does about priority queues

Don’t pay attention, look like it was temporal. It works now. :sweat_smile:

I do remember some brief discussion with @oxinabox, but they have a very similar datastructure (their mutable binary heap). My heap has some extra functionality (it is k-ary with the arity coded in type space by means of a type parameter, for example) but I think it deviated a little from what is expected from DataStructures.jl, as they have more than a heap type and try to have all of them to share a similar interface.

1 Like

I’ll benchmark against their heaps as well if I have time. What stopped me sofar is that the API is slightly different (no enqueue! or dequeue! function) but I can specialize Dijkstra to work with that

A few gotchas:
no_priority_updates makes no sense for PriorityQueue, because it can’t have a duplicate key. The error is not triggered in the unweighted version because a the heuristic distance of a vertex is never re-updated, as the first check always correspond to the shortest. I removed it.

Benchmarking with unweighted graph is suspicious. As said above, it cause the distance to never be re-updated. Also, the priority queue will always hold only to different weights: the one of the vertex being evaluated, and this value + 1, which may not be very representative of the use case of a priority queue. So I added benchmark for weighted graphs also.

Here is my code:

using BenchmarkTools
using DataStructures
using FastPriorityQueues
using Graphs
using LinearAlgebra
using SparseArrays
using Test

function dijkstra_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.
    enqueue!(q, s, 0.)
    while !isempty(q)
        u, k = first(q)
        dequeue!(q)
        d[u] = k
        for v in outneighbors(g, u)
            if d[u] + w[u, v] < d[v]
                d[v] = d[u] + w[u, v]
                d[v] == Inf ? enqueue!(q, v, d[v]) : q[v] = d[v]
            end
        end
    end
    return d
end

function dijkstra_no_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.
    enqueue!(q, s, 0.)
    while !isempty(q)
        u, k = first(q)
        dequeue!(q)
        if k <= d[u]
            d[u] = k
            for v in outneighbors(g, u)
                if d[u] + w[u, v] < d[v]
                    d[v] = d[u] + w[u, v]
                    enqueue!(q, v, d[v])
                end
            end
        end
    end
    return d
end

function random_weights(g)
    srcs = src.(edges(g))
    dsts = dst.(edges(g))
    w = rand(ne(g))
    Symmetric(sparse(srcs, dsts, w, ne(g), ne(g))) # assuming src < dst
end


n = 10
g = Graphs.grid([n, n])
w = random_weights(g)


d1 = dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), g, 1)[end];
d2 = dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), g, 1)[end];
d3 = dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), g, 1)[end];
@test d1 β‰ˆ d2 β‰ˆ d3

d1 = dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), g, 1, w)[end];
d2 = dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), g, 1, w)[end];
d3 = dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), g, 1, w)[end];
@test d1 β‰ˆ d2 β‰ˆ d3

n = 100
g = Graphs.grid([n, n])
w = random_weights(g)

@benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1)
@benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1)
@benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1)

@benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1, $w)
@benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1, $w)
@benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1, $w)

And here are the results:

julia> @benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1)
BenchmarkTools.Trial: 124 samples with 1 evaluation.
 Range (min … max):  39.799 ms …  45.018 ms  β”Š GC (min … max): 15.24% … 13.04%
 Time  (median):     40.171 ms               β”Š GC (median):    15.15%
 Time  (mean Β± Οƒ):   40.327 ms Β± 559.114 ΞΌs  β”Š GC (mean Β± Οƒ):  15.12% Β±  0.23%

        β–„  β–ˆβ–…β–‚β–…     β–‚                                           
  β–ƒβ–ƒβ–†β–†β–‡β–‡β–ˆβ–‡β–†β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–†β–†β–…β–ˆβ–…β–β–†β–†β–β–…β–ˆβ–ƒβ–ƒβ–†β–…β–β–β–β–β–…β–β–ƒβ–β–β–β–β–β–ƒβ–β–β–…β–β–…β–β–β–β–β–…β–β–ƒβ–β–β–β–β–ƒ β–ƒ
  39.8 ms         Histogram: frequency by time         41.6 ms <

 Memory estimate: 87.80 MiB, allocs estimate: 70130.

julia> @benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1)
BenchmarkTools.Trial: 540 samples with 1 evaluation.
 Range (min … max):  9.181 ms …  11.195 ms  β”Š GC (min … max): 0.00% … 15.54%
 Time  (median):     9.234 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   9.264 ms Β± 115.018 ΞΌs  β”Š GC (mean Β± Οƒ):  0.03% Β±  0.67%

        β–ˆβ–ƒ                                                     
  β–†β–‡β–ƒβ–ƒβ–‚β–ˆβ–ˆβ–ˆβ–†β–…β–ƒβ–„β–ƒβ–ƒβ–„β–ƒβ–„β–ƒβ–ƒβ–ƒβ–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–β–‚β–‚β–‚β–β–‚β–‚β–β–‚β–β–‚β–‚β–β–β–β–‚β–β–β–‚β–β–β–β–β–β–β–β–β–‚β–‚ β–ƒ
  9.18 ms         Histogram: frequency by time         9.6 ms <

 Memory estimate: 92.97 KiB, allocs estimate: 7.

julia> @benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1)
BenchmarkTools.Trial: 2913 samples with 1 evaluation.
 Range (min … max):  1.670 ms …  3.855 ms  β”Š GC (min … max): 0.00% … 45.78%
 Time  (median):     1.691 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.708 ms Β± 96.639 ΞΌs  β”Š GC (mean Β± Οƒ):  0.21% Β±  2.14%

     β–‚β–‚β–‡β–ˆβ–†β–„β–                                                  
  β–‚β–ƒβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–ƒβ–ƒβ–‚β–‚β–‚β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–‚
  1.67 ms        Histogram: frequency by time        1.81 ms <

 Memory estimate: 92.97 KiB, allocs estimate: 7.

julia> @benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1, $w)
BenchmarkTools.Trial: 79 samples with 1 evaluation.
 Range (min … max):  61.760 ms … 66.214 ms  β”Š GC (min … max): 12.46% … 11.60%
 Time  (median):     64.585 ms              β”Š GC (median):    15.68%
 Time  (mean Β± Οƒ):   63.763 ms Β±  1.416 ms  β”Š GC (mean Β± Οƒ):  14.24% Β±  1.65%

     β–…β–… β–…β–… β–‚                               β–‚β–ˆβ–ˆβ–…β–‚β–‚  β–‚    β–‚  β–‚   
  β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–ˆβ–…β–ˆβ–…β–…β–β–β–β–…β–β–…β–ˆβ–β–β–…β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–…β–…β–β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–β–ˆβ–ˆβ–…β–…β–β–…β–ˆβ–β–β–ˆβ–ˆ ▁
  61.8 ms         Histogram: frequency by time        65.8 ms <

 Memory estimate: 154.20 MiB, allocs estimate: 70086.

julia> @benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1, $w)
BenchmarkTools.Trial: 151 samples with 1 evaluation.
 Range (min … max):  32.278 ms …  34.562 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     33.061 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   33.132 ms Β± 597.476 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

   ▁▃ ▁▄ β–ˆ  ▆▁       ▁     ▁    ▁                 ▁ ▁ β–†  β–ƒ β–„    
  β–‡β–ˆβ–ˆβ–†β–ˆβ–ˆβ–‡β–ˆβ–„β–„β–ˆβ–ˆβ–‡β–β–‡β–†β–„β–β–„β–ˆβ–‡β–†β–β–†β–†β–ˆβ–„β–‡β–‡β–β–ˆβ–„β–†β–„β–†β–„β–‡β–†β–‡β–β–β–β–†β–„β–β–β–‡β–β–ˆβ–„β–ˆβ–‡β–ˆβ–‡β–†β–ˆβ–‡β–ˆβ–‡β–„ β–„
  32.3 ms         Histogram: frequency by time         34.1 ms <

 Memory estimate: 121.47 KiB, allocs estimate: 8.

julia> @benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1, $w)
BenchmarkTools.Trial: 1204 samples with 1 evaluation.
 Range (min … max):  3.982 ms …   7.216 ms  β”Š GC (min … max): 0.00% … 24.18%
 Time  (median):     4.065 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   4.142 ms Β± 243.461 ΞΌs  β”Š GC (mean Β± Οƒ):  0.07% Β±  1.06%

  β–ˆβ–…β–…β–†β–‡β–…β–…β–„β–ƒβ–ƒβ–ƒβ–β– ▁                                              
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–†β–ˆβ–…β–‡β–†β–ˆβ–ˆβ–‡β–„β–…β–…β–„β–„β–†β–…β–‡β–†β–†β–…β–…β–†β–†β–…β–„β–†β–†β–β–…β–β–„β–„β–„β–…β–…β–…β–†β–„β–† β–ˆ
  3.98 ms      Histogram: log(frequency) by time      5.11 ms <

 Memory estimate: 121.47 KiB, allocs estimate: 8.

So I still get a huge speedup.

1 Like

I think the huge speedup we both saw is due to a bias in my initial code. For some reason, the function first is very inefficient when applied to a DataStructures.PriorityQueue, and that explains most of the gap. Replace u, k = first(q); dequeue!(q) with u, k = dequeue_pair!(q) and we go back to a x2 speedup instead of x10.

To confirm this, I replaced dijkstra_priority_updates! with Graphs.dijkstra_shortest_path, and the results changed drastically.

1 Like

Indeed, I also get much better timings with dequeue_pair, I will post the new benchmark

plot_12
plot_14
Here’s what I get when correcting this bug (source code is in FastPriorityQueues.jl)

1 Like

Edit: passed weights in the benchmark
Here are the results:
plot

This is run on weighted grids, PriorityQueue use priority updates, while the three other data structures don’t. As expected, Priority queue behave similarly as the Graphs.jl implementation, while the SortedVectorPriorityQueue and HeapPriorityQueue are a little bit faster. VectorPriorityQueue just seems not worth it.

Here is my code, based on the actual benchmark of FastPriorityQueues:

using FastPriorityQueues
using BenchmarkTools  #src
using DataStructures
using FastPriorityQueues
using Graphs
using Plots
using Test
using SparseArrays
using LinearAlgebra
using Random

function dijkstra_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.
    enqueue!(q, s, 0.)
    while !isempty(q)
        u, k = dequeue_pair!(q)
        d[u] = k
        for v in outneighbors(g, u)
            if d[u] + w[u, v] < d[v]
                d[v] = d[u] + w[u, v]
                d[v] == Inf ? enqueue!(q, v, d[v]) : q[v] = d[v]
            end
        end
    end
    return d
end


function dijkstra_no_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.0
    enqueue!(q, s, 0.0)
    while !isempty(q)
        u, d_u = dequeue_pair!(q)
        if d_u <= d[u]
            d[u] = d_u
            for v in outneighbors(g, u)
                d_v = d[u] + w[u, v]
                if d_v < d[v]
                    enqueue!(q, v, d_v)
                    d[v] = d_v
                end
            end
        end
    end
    return d
end

function random_weights(g)
    srcs = src.(edges(g))
    dsts = dst.(edges(g))
    rng = Xoshiro(1234);
    w = rand(rng, ne(g))
    Symmetric(sparse(srcs, dsts, w, ne(g), ne(g))) # assuming src < dst
end

test_dijkstra_default(g, w=weights(g)) = Graphs.dijkstra_shortest_paths(g, 1, w).dists[end];
test_dijkstra_priority_updates(g, qtype, w=weights(g)) = dijkstra_priority_updates!(qtype(), g, 1, w)[end];
test_dijkstra_no_priority_updates(g, qtype, w=weights(g)) = dijkstra_no_priority_updates!(qtype(), g, 1, w)[end];

g_small = Graphs.grid([10, 10])
w_small = random_weights(g_small)


d1 = test_dijkstra_default(g_small);
d2 = test_dijkstra_priority_updates(g_small, PriorityQueue{Int,Float64});
d3 = test_dijkstra_no_priority_updates(g_small, VectorPriorityQueue{Int,Float64});
d4 = test_dijkstra_no_priority_updates(g_small, SortedVectorPriorityQueue{Int,Float64});
d5 = test_dijkstra_no_priority_updates(g_small, HeapPriorityQueue{Int,Float64});
@test d1 β‰ˆ d2 β‰ˆ d3 β‰ˆ d4 β‰ˆ d5

d1 = test_dijkstra_default(g_small, w_small);
d2 = test_dijkstra_priority_updates(g_small, PriorityQueue{Int,Float64}, w_small);
d3 = test_dijkstra_no_priority_updates(g_small, VectorPriorityQueue{Int,Float64}, w_small);
d4 = test_dijkstra_no_priority_updates(g_small, SortedVectorPriorityQueue{Int,Float64}, w_small);
d5 = test_dijkstra_no_priority_updates(g_small, HeapPriorityQueue{Int,Float64}, w_small);
@test d1 β‰ˆ d2 β‰ˆ d3 β‰ˆ d4 β‰ˆ d5

function compare_dijkstra_versions(n_values, weighted=true)
    speed_gains = Dict(
        "PriorityQueue" => Float64[],
        "VectorPriorityQueue" => Float64[],
        "SortedVectorPriorityQueue" => Float64[],
        "HeapPriorityQueue" => Float64[],
    )
    memory_gains = Dict(
        "PriorityQueue" => Float64[],
        "VectorPriorityQueue" => Float64[],
        "SortedVectorPriorityQueue" => Float64[],
        "HeapPriorityQueue" => Float64[],
    )
    for n in n_values
        @info "Testing grids of side length $n"
        g = Graphs.grid([n, n])
        if weighted
            w = random_weights(g)
        else
            w = weights(g)
        end
        _, t0, m0, _, _ = @timed for _ in 1:5
            test_dijkstra_default(g, w)
        end
        _, t1, m1, _, _ = @timed for _ in 1:5
            test_dijkstra_priority_updates(g, PriorityQueue{Int,Float64}, w);
        end
        _, t3, m3, _, _ = @timed for _ in 1:5
            test_dijkstra_no_priority_updates(g, VectorPriorityQueue{Int,Float64}, w);
        end
        _, t4, m4, _, _ = @timed for _ in 1:5
            test_dijkstra_no_priority_updates(g, SortedVectorPriorityQueue{Int,Float64}, w);
        end
        _, t5, m5, _, _ = @timed for _ in 1:5
            test_dijkstra_no_priority_updates(g, HeapPriorityQueue{Int,Float64}, w
![plot|600x400](upload://fDd1mzQVRFHaBpzZ096T9iTkNYI.png)
);
        end
        push!(speed_gains["PriorityQueue"], t0 / t1)
        push!(speed_gains["VectorPriorityQueue"], t0 / t3)
        push!(speed_gains["SortedVectorPriorityQueue"], t0 / t4)
        push!(speed_gains["HeapPriorityQueue"], t0 / t5)
        push!(memory_gains["PriorityQueue"], m0 / m1)
        push!(memory_gains["VectorPriorityQueue"], m0 / m3)
        push!(memory_gains["SortedVectorPriorityQueue"], m0 / m4)
        push!(memory_gains["HeapPriorityQueue"], m0 / m5)

    end
    return speed_gains, memory_gains
end

# To gain precision, we could replace the built-in `@elapsed` with `@belapsed` from [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl).

n_values = [10, 30, 100, 300, 1000]
speed_gains, memory_gains = compare_dijkstra_versions(n_values)

# Finally, let us plot the results.

settings = [
    "PriorityQueue", "SortedVectorPriorityQueue", "VectorPriorityQueue", "HeapPriorityQueue"
]
S = length(settings)
N = length(n_values)

# And we follow up with memory use

plt = plot(;
    title="Dijkstra with no priority updates",
    xlabel="Grid side length",
    ylabel="Memory gain wrt. Graphs.jl",
    ylim=(0, Inf),
    xticks=(1:N, string.(n_values)),
    margin=5Plots.mm,
    # legend_position=:topleft
)
for (k, setting) in enumerate(settings)
    bar!(
        plt,
        (1:N) .+ 0.7 * (k - (S + 1) / 2) / S,
        memory_gains[setting];
        label=setting,
        bar_width=0.7 / S,
    )
end
plt


# First we compare execution time

plt = plot(;
    title="Dijkstra with no priority updates",
    xlabel="Grid side length",
    ylabel="Speed gain wrt. Graphs.jl",
    ylim=(0, Inf),
    xticks=(1:N, string.(n_values)),
    margin=5Plots.mm,
    legend_position=:legend
)
for (k, setting) in enumerate(settings)
    bar!(
        plt,
        (1:N) .+ 0.7 * (k - (S + 1) / 2) / S,
        speed_gains[setting];
        label=setting,
        bar_width=0.7 / S,
    )
end
plt

OK so this sounds more reasonable than my initial (outrageous) claims :sweat_smile:
If these speedups can be observed on other types of graphs, maybe it would be worth it to switch to a heap-based Dijkstra. In particular, I’m curious about what would happen with dense graphs

Edit: passed weights in the benchmark
Went a bit further, it looks like this:
plot2
HeapPriorityQueue seems to scale pretty good, SortedVectorPriorityQueue scales badly.

On your benchmark, SortedVectorPriorityQueue and HeapPriorityQueue seems to be slower than on mine, I will try to dig into it.

Edit Oups, did not passed weights in the function :sweat_smile:. I will fix the benchmarks

Okay then, looks like the heap is a serious contender. HeapPriorityQueue is just a dumb wrapper to be able to use enqueue!/dequeue!, but if we go that way we might as well directly use push!/pop! in Dijkstra

1 Like