Fast(er) priority queues?

gdalle · May 18, 2022, 4:10pm

Hi there!
Recently I wanted to accelerate Dijkstra’s algorithm, and I ended up implementing a really dumb priority queue based on two sorted vectors. Surprisingly, in some settings, it significantly speeds up the algorithm. Would something like that deserve a place in DataStructures.jl?

See here for the rationale and there for the benchmark.

Ping @oxinabox @mschauer @mbesancon

Oscar_Smith · May 18, 2022, 4:21pm

It would be worth benchmarking this in 1.9 (Dict is about 30% faster for a lot of operations), and with SwissDict and RobinDict from DataStructures.

suavesito · May 18, 2022, 6:29pm

Just a notice, the link in the page of benchmarkings is broken. Is it the same as this “Priority Queues and Dijkstra’s Algorithm”?

(BTW, nice package. )

Henrique_Becker · May 18, 2022, 7:54pm

I have to point out that I have a package with a faster priority queue for years (as it can be seen in Priority queue choice - #3 by Henrique_Becker), and it did not make into DataStructures.jl. In general, I do not find very hard to outperform the data structures from DataStructures.jl.

gdalle · May 19, 2022, 3:53am

Yes it’s the same, but it is not broken on my browser

gdalle · May 19, 2022, 3:56am

Yes indeed, I cite your Discourse discussion with Moritz in my package docs Maybe I should include a more explicit reference to your package itself, sorry about that. Until this morning, the distinction between priority queue and heap was not very clear in my mind tbh

gdalle · May 19, 2022, 3:57am

Did you try to contribute to DataStructures.jl though?

gdalle · May 19, 2022, 4:00am

Great, I’ll take a look! My implementation is really dumb btw, and I can probably do better, at this point it is really more of a POC.
In fact, I’m not even sure that it deserves to go into DataStructures.jl: maybe the performance gap tells us more about Dijkstra than it does about priority queues

suavesito · May 19, 2022, 4:06am

Don’t pay attention, look like it was temporal. It works now.

Henrique_Becker · May 20, 2022, 12:28am

I do remember some brief discussion with @oxinabox, but they have a very similar datastructure (their mutable binary heap). My heap has some extra functionality (it is k-ary with the arity coded in type space by means of a type parameter, for example) but I think it deviated a little from what is expected from DataStructures.jl, as they have more than a heap type and try to have all of them to share a similar interface.

gdalle · May 20, 2022, 4:47am

I’ll benchmark against their heaps as well if I have time. What stopped me sofar is that the API is slightly different (no enqueue! or dequeue! function) but I can specialize Dijkstra to work with that

etienne_dg · May 20, 2022, 11:05am

A few gotchas:
no_priority_updates makes no sense for PriorityQueue, because it can’t have a duplicate key. The error is not triggered in the unweighted version because a the heuristic distance of a vertex is never re-updated, as the first check always correspond to the shortest. I removed it.

Benchmarking with unweighted graph is suspicious. As said above, it cause the distance to never be re-updated. Also, the priority queue will always hold only to different weights: the one of the vertex being evaluated, and this value + 1, which may not be very representative of the use case of a priority queue. So I added benchmark for weighted graphs also.

Here is my code:

using BenchmarkTools
using DataStructures
using FastPriorityQueues
using Graphs
using LinearAlgebra
using SparseArrays
using Test

function dijkstra_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.
    enqueue!(q, s, 0.)
    while !isempty(q)
        u, k = first(q)
        dequeue!(q)
        d[u] = k
        for v in outneighbors(g, u)
            if d[u] + w[u, v] < d[v]
                d[v] = d[u] + w[u, v]
                d[v] == Inf ? enqueue!(q, v, d[v]) : q[v] = d[v]
            end
        end
    end
    return d
end

function dijkstra_no_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.
    enqueue!(q, s, 0.)
    while !isempty(q)
        u, k = first(q)
        dequeue!(q)
        if k <= d[u]
            d[u] = k
            for v in outneighbors(g, u)
                if d[u] + w[u, v] < d[v]
                    d[v] = d[u] + w[u, v]
                    enqueue!(q, v, d[v])
                end
            end
        end
    end
    return d
end

function random_weights(g)
    srcs = src.(edges(g))
    dsts = dst.(edges(g))
    w = rand(ne(g))
    Symmetric(sparse(srcs, dsts, w, ne(g), ne(g))) # assuming src < dst
end


n = 10
g = Graphs.grid([n, n])
w = random_weights(g)


d1 = dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), g, 1)[end];
d2 = dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), g, 1)[end];
d3 = dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), g, 1)[end];
@test d1 ≈ d2 ≈ d3

d1 = dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), g, 1, w)[end];
d2 = dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), g, 1, w)[end];
d3 = dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), g, 1, w)[end];
@test d1 ≈ d2 ≈ d3

n = 100
g = Graphs.grid([n, n])
w = random_weights(g)

@benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1)
@benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1)
@benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1)

@benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1, $w)
@benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1, $w)
@benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1, $w)

And here are the results:

julia> @benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1)
BenchmarkTools.Trial: 124 samples with 1 evaluation.
 Range (min … max):  39.799 ms …  45.018 ms  ┊ GC (min … max): 15.24% … 13.04%
 Time  (median):     40.171 ms               ┊ GC (median):    15.15%
 Time  (mean ± σ):   40.327 ms ± 559.114 μs  ┊ GC (mean ± σ):  15.12% ±  0.23%

        ▄  █▅▂▅     ▂                                           
  ▃▃▆▆▇▇█▇▆████▇█▆▆▅█▅▁▆▆▁▅█▃▃▆▅▁▁▁▁▅▁▃▁▁▁▁▁▃▁▁▅▁▅▁▁▁▁▅▁▃▁▁▁▁▃ ▃
  39.8 ms         Histogram: frequency by time         41.6 ms <

 Memory estimate: 87.80 MiB, allocs estimate: 70130.

julia> @benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1)
BenchmarkTools.Trial: 540 samples with 1 evaluation.
 Range (min … max):  9.181 ms …  11.195 ms  ┊ GC (min … max): 0.00% … 15.54%
 Time  (median):     9.234 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.264 ms ± 115.018 μs  ┊ GC (mean ± σ):  0.03% ±  0.67%

        █▃                                                     
  ▆▇▃▃▂███▆▅▃▄▃▃▄▃▄▃▃▃▂▃▃▃▃▃▃▂▂▂▁▂▂▂▁▂▂▁▂▁▂▂▁▁▁▂▁▁▂▁▁▁▁▁▁▁▁▂▂ ▃
  9.18 ms         Histogram: frequency by time         9.6 ms <

 Memory estimate: 92.97 KiB, allocs estimate: 7.

julia> @benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1)
BenchmarkTools.Trial: 2913 samples with 1 evaluation.
 Range (min … max):  1.670 ms …  3.855 ms  ┊ GC (min … max): 0.00% … 45.78%
 Time  (median):     1.691 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.708 ms ± 96.639 μs  ┊ GC (mean ± σ):  0.21% ±  2.14%

     ▂▂▇█▆▄▁                                                  
  ▂▃▆████████▅▃▂▂▂▂▂▂▃▃▄▄▄▄▄▄▃▃▂▂▂▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.67 ms        Histogram: frequency by time        1.81 ms <

 Memory estimate: 92.97 KiB, allocs estimate: 7.

julia> @benchmark dijkstra_priority_updates!(PriorityQueue{Int,Float64}(), $g, 1, $w)
BenchmarkTools.Trial: 79 samples with 1 evaluation.
 Range (min … max):  61.760 ms … 66.214 ms  ┊ GC (min … max): 12.46% … 11.60%
 Time  (median):     64.585 ms              ┊ GC (median):    15.68%
 Time  (mean ± σ):   63.763 ms ±  1.416 ms  ┊ GC (mean ± σ):  14.24% ±  1.65%

     ▅▅ ▅▅ ▂                               ▂██▅▂▂  ▂    ▂  ▂   
  ▅███████▅█▅█▅▅▁▁▁▅▁▅█▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▁██████▁██▅▅▁▅█▁▁██ ▁
  61.8 ms         Histogram: frequency by time        65.8 ms <

 Memory estimate: 154.20 MiB, allocs estimate: 70086.

julia> @benchmark dijkstra_no_priority_updates!(VectorPriorityQueue{Int,Float64}(), $g, 1, $w)
BenchmarkTools.Trial: 151 samples with 1 evaluation.
 Range (min … max):  32.278 ms …  34.562 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     33.061 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   33.132 ms ± 597.476 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▃ ▁▄ █  ▆▁       ▁     ▁    ▁                 ▁ ▁ ▆  ▃ ▄    
  ▇██▆██▇█▄▄██▇▁▇▆▄▁▄█▇▆▁▆▆█▄▇▇▁█▄▆▄▆▄▇▆▇▁▁▁▆▄▁▁▇▁█▄█▇█▇▆█▇█▇▄ ▄
  32.3 ms         Histogram: frequency by time         34.1 ms <

 Memory estimate: 121.47 KiB, allocs estimate: 8.

julia> @benchmark dijkstra_no_priority_updates!(SortedVectorPriorityQueue{Int,Float64}(), $g, 1, $w)
BenchmarkTools.Trial: 1204 samples with 1 evaluation.
 Range (min … max):  3.982 ms …   7.216 ms  ┊ GC (min … max): 0.00% … 24.18%
 Time  (median):     4.065 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.142 ms ± 243.461 μs  ┊ GC (mean ± σ):  0.07% ±  1.06%

  █▅▅▆▇▅▅▄▃▃▃▁▁ ▁                                              
  █████████████████▇██▇▆█▅▇▆██▇▄▅▅▄▄▆▅▇▆▆▅▅▆▆▅▄▆▆▁▅▁▄▄▄▅▅▅▆▄▆ █
  3.98 ms      Histogram: log(frequency) by time      5.11 ms <

 Memory estimate: 121.47 KiB, allocs estimate: 8.

So I still get a huge speedup.

gdalle · May 20, 2022, 12:26pm

I think the huge speedup we both saw is due to a bias in my initial code. For some reason, the function first is very inefficient when applied to a DataStructures.PriorityQueue, and that explains most of the gap. Replace u, k = first(q); dequeue!(q) with u, k = dequeue_pair!(q) and we go back to a x2 speedup instead of x10.

To confirm this, I replaced dijkstra_priority_updates! with Graphs.dijkstra_shortest_path, and the results changed drastically.

etienne_dg · May 20, 2022, 12:39pm

Indeed, I also get much better timings with dequeue_pair, I will post the new benchmark

gdalle · May 20, 2022, 1:08pm

plot_12
plot_14
Here’s what I get when correcting this bug (source code is in FastPriorityQueues.jl)

etienne_dg · May 20, 2022, 1:09pm

Edit: passed weights in the benchmark
Here are the results:
plot

This is run on weighted grids, PriorityQueue use priority updates, while the three other data structures don’t. As expected, Priority queue behave similarly as the Graphs.jl implementation, while the SortedVectorPriorityQueue and HeapPriorityQueue are a little bit faster. VectorPriorityQueue just seems not worth it.

Here is my code, based on the actual benchmark of FastPriorityQueues:

using FastPriorityQueues
using BenchmarkTools  #src
using DataStructures
using FastPriorityQueues
using Graphs
using Plots
using Test
using SparseArrays
using LinearAlgebra
using Random

function dijkstra_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.
    enqueue!(q, s, 0.)
    while !isempty(q)
        u, k = dequeue_pair!(q)
        d[u] = k
        for v in outneighbors(g, u)
            if d[u] + w[u, v] < d[v]
                d[v] = d[u] + w[u, v]
                d[v] == Inf ? enqueue!(q, v, d[v]) : q[v] = d[v]
            end
        end
    end
    return d
end


function dijkstra_no_priority_updates!(
    q::Q, g::AbstractGraph{T}, s::Integer, w=weights(g)
) where {Q,T}
    d = fill(Inf, nv(g))
    d[s] = 0.0
    enqueue!(q, s, 0.0)
    while !isempty(q)
        u, d_u = dequeue_pair!(q)
        if d_u <= d[u]
            d[u] = d_u
            for v in outneighbors(g, u)
                d_v = d[u] + w[u, v]
                if d_v < d[v]
                    enqueue!(q, v, d_v)
                    d[v] = d_v
                end
            end
        end
    end
    return d
end

function random_weights(g)
    srcs = src.(edges(g))
    dsts = dst.(edges(g))
    rng = Xoshiro(1234);
    w = rand(rng, ne(g))
    Symmetric(sparse(srcs, dsts, w, ne(g), ne(g))) # assuming src < dst
end

test_dijkstra_default(g, w=weights(g)) = Graphs.dijkstra_shortest_paths(g, 1, w).dists[end];
test_dijkstra_priority_updates(g, qtype, w=weights(g)) = dijkstra_priority_updates!(qtype(), g, 1, w)[end];
test_dijkstra_no_priority_updates(g, qtype, w=weights(g)) = dijkstra_no_priority_updates!(qtype(), g, 1, w)[end];

g_small = Graphs.grid([10, 10])
w_small = random_weights(g_small)


d1 = test_dijkstra_default(g_small);
d2 = test_dijkstra_priority_updates(g_small, PriorityQueue{Int,Float64});
d3 = test_dijkstra_no_priority_updates(g_small, VectorPriorityQueue{Int,Float64});
d4 = test_dijkstra_no_priority_updates(g_small, SortedVectorPriorityQueue{Int,Float64});
d5 = test_dijkstra_no_priority_updates(g_small, HeapPriorityQueue{Int,Float64});
@test d1 ≈ d2 ≈ d3 ≈ d4 ≈ d5

d1 = test_dijkstra_default(g_small, w_small);
d2 = test_dijkstra_priority_updates(g_small, PriorityQueue{Int,Float64}, w_small);
d3 = test_dijkstra_no_priority_updates(g_small, VectorPriorityQueue{Int,Float64}, w_small);
d4 = test_dijkstra_no_priority_updates(g_small, SortedVectorPriorityQueue{Int,Float64}, w_small);
d5 = test_dijkstra_no_priority_updates(g_small, HeapPriorityQueue{Int,Float64}, w_small);
@test d1 ≈ d2 ≈ d3 ≈ d4 ≈ d5

function compare_dijkstra_versions(n_values, weighted=true)
    speed_gains = Dict(
        "PriorityQueue" => Float64[],
        "VectorPriorityQueue" => Float64[],
        "SortedVectorPriorityQueue" => Float64[],
        "HeapPriorityQueue" => Float64[],
    )
    memory_gains = Dict(
        "PriorityQueue" => Float64[],
        "VectorPriorityQueue" => Float64[],
        "SortedVectorPriorityQueue" => Float64[],
        "HeapPriorityQueue" => Float64[],
    )
    for n in n_values
        @info "Testing grids of side length $n"
        g = Graphs.grid([n, n])
        if weighted
            w = random_weights(g)
        else
            w = weights(g)
        end
        _, t0, m0, _, _ = @timed for _ in 1:5
            test_dijkstra_default(g, w)
        end
        _, t1, m1, _, _ = @timed for _ in 1:5
            test_dijkstra_priority_updates(g, PriorityQueue{Int,Float64}, w);
        end
        _, t3, m3, _, _ = @timed for _ in 1:5
            test_dijkstra_no_priority_updates(g, VectorPriorityQueue{Int,Float64}, w);
        end
        _, t4, m4, _, _ = @timed for _ in 1:5
            test_dijkstra_no_priority_updates(g, SortedVectorPriorityQueue{Int,Float64}, w);
        end
        _, t5, m5, _, _ = @timed for _ in 1:5
            test_dijkstra_no_priority_updates(g, HeapPriorityQueue{Int,Float64}, w
![plot|600x400](upload://fDd1mzQVRFHaBpzZ096T9iTkNYI.png)
);
        end
        push!(speed_gains["PriorityQueue"], t0 / t1)
        push!(speed_gains["VectorPriorityQueue"], t0 / t3)
        push!(speed_gains["SortedVectorPriorityQueue"], t0 / t4)
        push!(speed_gains["HeapPriorityQueue"], t0 / t5)
        push!(memory_gains["PriorityQueue"], m0 / m1)
        push!(memory_gains["VectorPriorityQueue"], m0 / m3)
        push!(memory_gains["SortedVectorPriorityQueue"], m0 / m4)
        push!(memory_gains["HeapPriorityQueue"], m0 / m5)

    end
    return speed_gains, memory_gains
end

# To gain precision, we could replace the built-in `@elapsed` with `@belapsed` from [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl).

n_values = [10, 30, 100, 300, 1000]
speed_gains, memory_gains = compare_dijkstra_versions(n_values)

# Finally, let us plot the results.

settings = [
    "PriorityQueue", "SortedVectorPriorityQueue", "VectorPriorityQueue", "HeapPriorityQueue"
]
S = length(settings)
N = length(n_values)

# And we follow up with memory use

plt = plot(;
    title="Dijkstra with no priority updates",
    xlabel="Grid side length",
    ylabel="Memory gain wrt. Graphs.jl",
    ylim=(0, Inf),
    xticks=(1:N, string.(n_values)),
    margin=5Plots.mm,
    # legend_position=:topleft
)
for (k, setting) in enumerate(settings)
    bar!(
        plt,
        (1:N) .+ 0.7 * (k - (S + 1) / 2) / S,
        memory_gains[setting];
        label=setting,
        bar_width=0.7 / S,
    )
end
plt


# First we compare execution time

plt = plot(;
    title="Dijkstra with no priority updates",
    xlabel="Grid side length",
    ylabel="Speed gain wrt. Graphs.jl",
    ylim=(0, Inf),
    xticks=(1:N, string.(n_values)),
    margin=5Plots.mm,
    legend_position=:legend
)
for (k, setting) in enumerate(settings)
    bar!(
        plt,
        (1:N) .+ 0.7 * (k - (S + 1) / 2) / S,
        speed_gains[setting];
        label=setting,
        bar_width=0.7 / S,
    )
end
plt

gdalle · May 20, 2022, 1:13pm

OK so this sounds more reasonable than my initial (outrageous) claims
If these speedups can be observed on other types of graphs, maybe it would be worth it to switch to a heap-based Dijkstra. In particular, I’m curious about what would happen with dense graphs

etienne_dg · May 20, 2022, 2:20pm

Edit: passed weights in the benchmark
Went a bit further, it looks like this:
plot2
HeapPriorityQueue seems to scale pretty good, SortedVectorPriorityQueue scales badly.

etienne_dg · May 20, 2022, 2:36pm

On your benchmark, SortedVectorPriorityQueue and HeapPriorityQueue seems to be slower than on mine, I will try to dig into it.

Edit Oups, did not passed weights in the function . I will fix the benchmarks

gdalle · May 20, 2022, 3:45pm

Okay then, looks like the heap is a serious contender. HeapPriorityQueue is just a dumb wrapper to be able to use enqueue!/dequeue!, but if we go that way we might as well directly use push!/pop! in Dijkstra

Topic		Replies	Views
Priority queue choice Performance	24	3216	September 1, 2022
Faster updates in a priority queue Performance	4	547	January 14, 2021
PriorityQueue and containers Data	3	2232	July 12, 2017
Basic datastructures benchmark Performance	1	289	October 22, 2021
Is priority queue more efficient than a vector / list, in terms of inserting an element into a structure which keeps a decreasing / increasing order? General Usage	4	756	May 22, 2019

Fast(er) priority queues?

Related topics