Networkx & lightgraphs.jl shortest path benchmark

I have developped a small package (GraphMLReader.jl ) to load GraphML file to MetaGraph, which keep all the attributes of nodes and edges. I compare the speed of reading GraphML file and dijkstra shortest path of networkx+Python3 and LightGraphs.jl+Julia1.4.

I tested the algorithm on a large transport network, with 109,743 nodes and 379,474 edges. We calculate the average running time of 20 times.

The results are as follows.
Loading time: Python is a bit faster
Shortest path: Julia is about two times faster than Python.

Python3 Julia1.4
Load data 9.77s 12.03s
Dijkstra Shortest path 0.706s 0.354s

The shortest algorithm is implemented as follows.

using LightGraphs
using GraphMLReader

g_file = "data/large_traffic_network.graphml"
 G = GraphMLReader.loadgraphml( g_file );

## read the original node ids
file_path = "data/origin.json"
origin_ids = JSON.parsefile(file_path)
ids = id_dict(G) 
weightfield!(G, :length)  # set the weight attribute

for id_origin in origin_ids[1:20]
    id = ids[ id_origin ]
    @time rst = dijkstra_shortest_paths(G, [id], w)
end

Is there anyway to speed up the shortest path algorithm of the Julia code using LightGraphs.jl ? Currently, the speed is still similar.

Yes, there are a few ways to speed up the DSP:

  1. Squash the graph. You’re probably treating the vertices as Ints. It’s doubtful you have 2^63 vertices, so if you G = squash(G), then you will probably reduce your memory footprint and also speed things up because of SIMD.

  2. Use StaticGraphs.jl to represent the graph. StaticGraph uses a really cache-friendly data structure at the expense of not being able to modify the graph.

  3. Use parallel_dijkstra_shortest_paths(G, origin_ids[1:20], w) (not sure where w is coming from, but whatever.) If you run on a multi-core/multi-node machine with addprocs(), you will be able to run multiple DSPs at the same time.

  4. Use master and give the new dispatch code a try. It might be faster.

Also, we support GraphML in GraphIO.jl but if your implementation is faster we’d love to steal it.

Also also: don’t use @time. Benchmark properly using BenchmarkTools.jl.

How big is the graph? .354 seconds for a DSP seems to indicate a very large graph, but we see 10-100x performance improvements over NetworkX normally.

2 Likes

Thank you so much for your suggestion and great work on LightGraphs.jl.

  1. Squash the graph. You’re probably treating the vertices as Int s. It’s doubtful you have 2^63 vertices, so if you G = squash(G) , then you will probably reduce your memory footprint and also speed things up because of SIMD

There are 109743 vertices, which are treated as Int, from 1 to 109743. squash currently does not support MetaGraph. I does not test this, and it think it has benefit mainly on the memory rather than speed.

  1. Use StaticGraphs.jl to represent the graph. StaticGraph uses a really cache-friendly data structure at the expense of not being able to modify the graph.

I will test this latter. Thanks!

  1. Use parallel_dijkstra_shortest_paths(G, origin_ids[1:20], w) (not sure where w is coming from, but whatever.) If you run on a multi-core/multi-node machine with addprocs() , you will be able to run multiple DSPs at the same time.

I got an error while running the parallel version. I will show that in another response.

  1. Use master and give the new dispatch code a try. It might be faster.

I have not tested this.

Also, we support GraphML in GraphIO.jl but if your implementation is faster we’d love to steal it.

I need to load GraphML to a MetaGraph and have functions to process the node/edge attributes. MetaGraph.jl just change the original ID to Int starting from one, and create the new SimpleGraph. I cannot find the needed functions in GraphML.jl, then I create a small package to handle this.

Also also: don’t use @time . Benchmark properly using BenchmarkTools.jl.

It runs 20 times. The first run is a bit slower. We measure running time from the second run which would have similar results as BenchmarkTools because the code is already compiled. I tested BenchmarkTools, the running time is “0.3538 s” very similar to the current running time “0.354s”

How big is the graph? .354 seconds for a DSP seems to indicate a very large graph, but we see 10-100x performance improvements over NetworkX normally.

The road network has 109,743 nodes and 379,474 edges. I have uploaded for into Github Test data. Anyone can test using that data easily.

Thank you so much for you detailed reply and great contribution on LightGraphs.jl.

After using BenchmarkTools.jl, got very similar results as follows.

Python3 Julia1.4
Load data 9.77s 12.03s
Dijkstra Shortest path 0.706s 0.3538s

@anon94023334 Thank you so much for your suggestion. The speed up of StaticGraph.jl very significant. For the test case, about 7 times faster than LightGraphs.jl. Thanks for your work on StaticGraphs.jl

Shortest path speed

networkx LightGraphs.jl StaticGraph.jl
Dijkstra Shortest path 0.706s 0.3538s 0.0502s
Speedup && networkx 1 2.0X 14.1X

Load data speed

Python3 Julia1.4
Load data 9.77s 12.03s

The code of shortest path using StaticGraph.jl is as follows.

## load graphml file to MetaGraph Object
g_file = "data/large_traffic_network.graphml"
G = GraphMLReader.loadgraphml(g_file);
ids = id_dict(G)
weightfield!(G, :length)
w = weights(G)

## load test original vertices IDs
file_path = "data/origin.json"
origin_ids = JSON.parsefile(file_path)
origin_ids

## prepare StaticGraph and weight matrix
static_G = StaticDiGraph(G.graph)
w_adj = adjacency_matrix( static_G )
w_static = deepcopy(w_adj)
w_static = convert(SparseMatrixCSC{Float64,UInt32}, w_static)
for v in collect(zip(I,J))
    i,j = Int(v[1]), Int(v[2])
    w_static[i,j] = w[i,j]
end

## shortest path of 20 original vertices
ts = []
i = 0
for id_origin in origin_ids[1:20]
    id = ids[ id_origin ]
    t = @belapsed dijkstra_shortest_paths($static_G, [$id], w_static) samples=3
    push!(ts, t)
    i += 1
    @show i
end

@show sum(ts)/length(ts)
2 Likes
  • Why are you creating a new w_static? Can’t you just reuse w? Why do you need to convert the weights to integers in the staticgraph case but not in the metagraph case?

  • We should be 10-100x faster than NX in dijkstra. You can hack around the lack of squash in MetaGraphs by doing G2 = MetaGraph{UInt32, Float64}(G) and then using G2, but please file an issue because we should have squash.

Thank you so much for your swift reply!

  1. Why are you creating a new w_static ? Can’t you just reuse w ? Why do you need to convert the weights to integers in the staticgraph case but not in the metagraph case?

If I directly use w as weight matrix, there will be some errors as following.

## load graphml file to MetaGraph Object
g_file = "/Users/zhangliye/julia_dev/GraphMLReader.jl/data/large_traffic_network.graphml"
G = GraphMLReader.loadgraphml(g_file);
ids = id_dict(G)
weightfield!(G, :length)
w = weights(G)

## load test original vertices IDs
file_path = "/Users/zhangliye/julia_dev/GraphMLReader.jl/data/origin.json"
origin_ids = JSON.parsefile(file_path)

## shortest path of 20 original vertices
ts = []
i = 0
for id_origin in origin_ids[1:20]
    id = ids[ id_origin ]
#     t = @belapsed dijkstra_shortest_paths($static_G, [$id], w_static) samples=3
    t = @belapsed dijkstra_shortest_paths($static_G, [$id], $w) samples=3
    push!(ts, t)
    i += 1
    @show i
    break
end

@show sum(ts)/length(ts)

Error message:

MethodError: no method matching LightGraphs.SimpleGraphs.SimpleEdge(::Int64, ::UInt32)
Closest candidates are:
  LightGraphs.SimpleGraphs.SimpleEdge(::T, !Matched::T) where T<:Integer at /Users/zhangliye/.julia/packages/LightGraphs/siFgP/src/SimpleGraphs/simpleedge.jl:7

Stacktrace:
 [1] getindex(::MetaGraphs.MetaWeights{Int64,Float64}, ::Int64, ::UInt32) at /Users/zhangliye/.julia/packages/MetaGraphs/fmYHJ/src/MetaGraphs.jl:196
 [2] dijkstra_shortest_paths(::StaticDiGraph{UInt32,UInt32}, ::Array{Int64,1}, ::MetaGraphs.MetaWeights{Int64,Float64}; allpaths::Bool, trackvertices::Bool) at /Users/zhangliye/.julia/packages/LightGraphs/siFgP/src/shortestpaths/dijkstra.jl:95
 [3] dijkstra_shortest_paths at /Users/zhangliye/.julia/packages/LightGraphs/siFgP/src/shortestpaths/dijkstra.jl:66 [inlined]
 [4] ##core#1011(::StaticDiGraph{UInt32,UInt32}, ::Int64, ::MetaGraphs.MetaWeights{Int64,Float64}) at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:371
 [5] ##sample#1012(::BenchmarkTools.Parameters) at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:377
 [6] _run(::BenchmarkTools.Benchmark{Symbol("##benchmark#1010")}, ::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Iterators.Pairs{Symbol,Integer,NTuple{4,Symbol},NamedTuple{(:samples, :evals, :gctrial, :gcsample),Tuple{Int64,Int64,Bool,Bool}}}) at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:405
 [7] (::Base.var"#inner#2"{Base.Iterators.Pairs{Symbol,Integer,NTuple{5,Symbol},NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample),Tuple{Bool,Int64,Int64,Bool,Bool}}},typeof(BenchmarkTools._run),Tuple{BenchmarkTools.Benchmark{Symbol("##benchmark#1010")},BenchmarkTools.Parameters}})() at ./essentials.jl:715
 [8] #invokelatest#1 at ./essentials.jl:716 [inlined]
 [9] #run_result#37 at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:32 [inlined]
 [10] run(::BenchmarkTools.Benchmark{Symbol("##benchmark#1010")}, ::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Iterators.Pairs{Symbol,Integer,NTuple{5,Symbol},NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample),Tuple{Bool,Int64,Int64,Bool,Bool}}}) at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:94
 [11] #warmup#45 at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:141 [inlined]
 [12] warmup(::BenchmarkTools.Benchmark{Symbol("##benchmark#1010")}) at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:141
 [13] top-level scope at /Users/zhangliye/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:287
 [14] top-level scope at In[200]:18

System information:

Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
  1. We should be 10-100x faster than NX in dijkstra. You can hack around the lack of squash in MetaGraphs by doing G2 = MetaGraph{UInt32, Float64}(G) and then using G2 , but please file an issue because we should have squash .

I have create an issue need squash()

Thank you so much for your suggestions!

Can you report an issue with the weights as well? That looks suspiciously like a bug, and I’d like to take a look at it once things settle down with LightGraphs (it won’t be soon and I’ll forget about this conversation unless there’s a github issue). Thanks.

Thank you so much! I will create an issue. LightGraghs.jl is very flexible and extremely well designed! Thank you so much for your work!

I tested the Graph algorithms using a real transport network, one of the largest transport network in the world. Transport modeling and analysis would be one major application area of LightGraphs.jl. The code using networkx is too slow, which takes several hours to finish one-round calculation. Thus, I tested LightGraphs.jl, which looks very promising.

I have uploaded the traffic network file to Github (Large urban traffic network), and any one can use the data for the testing purpose.

4 Likes