Test if two vertices are connected in a graph

Hello all,

I would like to create a graph with no weights and efficiently test if vertices are connected (not necessarily neighbors).

I learned I can do this using shortest_path algorithms and connected_components but I wanted to see if there was a more efficient way, especially as the graph complexity grows.

MWE:

using BenchmarkTools
using Graphs
using Random

function is_connected_using_shortest_path(g, source, sink)
    # If there's a more efficient algorithm than a_star, let me know please
    paths = a_star(g, source, sink);
    return length(paths) > 0;
end

function is_connected_using_connected_components(g, source, sink)
    for cluster in connected_components(g)
        if source in cluster && sink in cluster
            return true;
        end
    end
    return false;
end

# Creating graph: (1)--(2)--(3)   (4)--(5)
g = path_graph(5);
rem_edge!(g, 3, 4);

Random.seed!(1234);
display(@benchmark is_connected_using_shortest_path(g, data[1], data[2]) setup=(data=rand(collect(1:5), 2)))

Random.seed!(1234);
display(@benchmark is_connected_using_connected_components(g, data[1], data[2]) setup=(data=rand(collect(1:5), 2)))

Output

BenchmarkTools.Trial: 10000 samples with 186 evaluations.
 Range (min … max):  555.914 ns … 29.407 ΞΌs  β”Š GC (min … max): 0.00% … 93.17%
 Time  (median):     694.086 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   825.910 ns Β±  1.122 ΞΌs  β”Š GC (mean Β± Οƒ):  8.80% Β±  6.43%

  β–ƒ β–†β–ˆβ–‡β–†β–‡    
  β–ˆβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–‚β–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–‚
  556 ns          Histogram: frequency by time         2.24 ΞΌs <

 Memory estimate: 1.09 KiB, allocs estimate: 12.
BenchmarkTools.Trial: 10000 samples with 189 evaluations.
 Range (min … max):  523.810 ns … 42.610 ΞΌs  β”Š GC (min … max):  0.00% … 95.61%
 Time  (median):     593.651 ns              β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   811.357 ns Β±  1.622 ΞΌs  β”Š GC (mean Β± Οƒ):  14.10% Β±  7.06%

  β–ˆβ–‡β–†β–†β–†β–„β–‚β–‚β–„β–„β–ƒβ–ƒβ–‚β–β–β–β–β– ▁          ▁                              β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–‡β–‡β–†β–†β–†β–…β–…β–…β–†β–…β–…β–†β–…β–…β–„β–…β–…β–ƒβ–ƒβ–…β–„ β–ˆ
  524 ns        Histogram: log(frequency) by time      2.07 ΞΌs <

 Memory estimate: 1.31 KiB, allocs estimate: 15.

EDIT1: Removed NetworkX link. I thought they have a function to do this.

How many times will you call this function? (if many, some pre-processing steps could help)

2 Likes

The canonical algorithm is union find.

An example implementation is available here.

A word of warning: is_connected will often not be thread-safe, due to amortized lazy updates (β€œoften” means: In many implementations in many programming languages; no idea what datastructures.jl; does).

1 Like

Thank you for the suggestions, @Dan and @foobar_lv2! Hopefully I incorporated your thoughts correctly. Here is my new code:

using BenchmarkTools
using DataStructures
using Graphs
using Random

function is_connected_using_shortest_path(g, source, sink)
    # If there's a more efficient algorithm than a_star, let me know please
    paths = a_star(g, source, sink);
    return length(paths) > 0;
end

function is_connected_using_connected_components(g, source, sink)
    for cluster in connected_components(g)
        if source in cluster && sink in cluster
            return true;
        end
    end
    return false;
end

function is_connected_using_disjointed_sets(disjoint_set, source, sink)
    return in_same_set(disjoint_set, source, sink);
end



# Creating graph: (1)--(2)--(3)   (4)--(5)
n_vertices = 5;
removed_edge = n_vertices Γ· 2;
g = path_graph(n_vertices);
rem_edge!(g, removed_edge, removed_edge + 1);

# Create disjointed sets beforehand
clusters = connected_components(g);
g_disjoint_sets = IntDisjointSets(nv(g));
for cluster in clusters
    map(i_vertex -> union!(g_disjoint_sets, cluster[1], i_vertex), cluster);
end

Random.seed!(1234);
println("SHORTEST PATHS METHOD");
display(@benchmark is_connected_using_shortest_path(g, data[1], data[2]) setup=(data=rand(collect(1:n_vertices), 2)))

Random.seed!(1234);
println("\n\nCONNECTED COMPONENTS METHOD");
display(@benchmark is_connected_using_connected_components(g, data[1], data[2]) setup=(data=rand(collect(1:n_vertices), 2)))

Random.seed!(1234);
println("\n\nDISJOINTED SETS METHOD");
display(@benchmark is_connected_using_disjointed_sets(g_disjoint_sets, data[1], data[2]) setup=(data=rand(collect(1:n_vertices), 2)))

And here are the results:

SHORTEST PATHS METHOD
BenchmarkTools.Trial: 10000 samples with 181 evaluations.
 Range (min … max):  572.928 ns … 37.107 ΞΌs  β”Š GC (min … max): 0.00% … 93.94%
 Time  (median):     797.238 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   959.663 ns Β±  1.255 ΞΌs  β”Š GC (mean Β± Οƒ):  8.15% Β±  6.35%

  β–ƒβ–„β–†β–‡β–‡β–ˆβ–ˆβ–ˆβ–‡β–‡β–…β–„β–ƒβ–ƒβ–‚β–‚β–ƒβ–‚β–‚β–β–β– ▁▁▁                                   β–ƒ
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–‡β–†β–ˆβ–‡β–‡β–‡β–‡β–ˆβ–ˆβ–‡β–‡β–‡β–…β–…β–…β–„β–†β–„β–†β–‚β–„β–ƒβ–„ β–ˆ
  573 ns        Histogram: log(frequency) by time      2.56 ΞΌs <

 Memory estimate: 1.09 KiB, allocs estimate: 12.


CONNECTED COMPONENTS METHOD
BenchmarkTools.Trial: 10000 samples with 191 evaluations.
 Range (min … max):  542.408 ns … 47.443 ΞΌs  β”Š GC (min … max):  0.00% … 97.71%
 Time  (median):     675.393 ns              β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   863.796 ns Β±  1.762 ΞΌs  β”Š GC (mean Β± Οƒ):  14.74% Β±  7.14%

     β–ƒβ–‡β–ˆβ–ƒ       
  β–ƒβ–„β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚ β–ƒ
  542 ns          Histogram: frequency by time         2.01 ΞΌs <

 Memory estimate: 1.31 KiB, allocs estimate: 15.


DISJOINTED SETS METHOD
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
 Range (min … max):  39.051 ns … 320.081 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     48.436 ns               β”Š GC (median):    0.00%        
 Time  (mean Β± Οƒ):   50.928 ns Β±  14.828 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ƒβ–‚β–„β–„β–…β–†β–‡β–ˆβ–†β–…β–‚β–                                         ▁       β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–†β–†β–†β–…β–…β–…β–†β–†β–†β–…β–…β–…β–ƒβ–‚β–ƒβ–†β–…β–…β–…β–…β–…β–†β–‡β–‚β–…β–†β–…β–†β–…β–†β–‡β–ˆβ–…β–‡β–…β–„β–…β–„ β–ˆ
  39.1 ns       Histogram: log(frequency) by time       116 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I’m still new to Julia so is the memory estimate of 0 bytes for the last method expected? Also, I tried testing different number of vertices and it seems to scale really well:

Mean Time
Number of vertices Shortest Path Connected Components Disjoined Set Method
10 1.227 ΞΌs 1.152 ΞΌs 56.596 ns
100 2.137 ΞΌs 6.008 ΞΌs 57.994 ns
1000 56.033 ΞΌs 49.573 ΞΌs 58.809 ns

Of course, this doesn’t take into account the time to create the disjointed sets and the graph is still pretty simple but I think it’ll translate to more complex graphs well.

Thanks again!

FWIW, the way of building that is

julia> function unionfind(g)
       dj = IntDisjointSets(nv(g))
       for idx = 1:nv(g)
       for other in neighbors(g, idx)
       union!(dj, idx, other)
       end
       end
       dj
       end

You don’t need to compute clusters beforehand.

The magic of unionfind is that it can be updated via edge addition (not removal, though!).

The correct way using Graphs only is

julia> function labelmake(g)
       labels = zeros(Int, nv(g))
       Graphs.connected_components!(labels, g)
       labels
       end
labelmake (generic function with 1 method)

julia> isconnected(labels, i, j) = labels[i]==labels[j]
isconnected (generic function with 1 method)

The connected_components! method is not exported in graphs. I found it via @less connected_components(g) `.

It is very advisable to always skim the code of all interesting functions you are using, before you start using clever workarounds for API shortcomings like your is_connected_using_connected_components that iterates over the clusters. If the API sucks for your purpose, then there is a chance that it internally uses a better API!

(not all packages / libraries have code where you can easily take a peek. Some code is heavy in non-ascii unicode or macros or has lots of indirections, or is generally hard to read. But it costs you at most 60 seconds to check whether one minute is enough to see what is happening under the hood)

3 Likes