LightGraphs benchmark study, take 2

anon94023334 · May 11, 2020, 4:52pm

Hi all,

You might recall that back in February Tim Lin posted some graph benchmarks, and LightGraphs did reasonably well. In my post at the time, I mentioned that Tim was going to re-run the benchmarks in a few months. Well, he’s just published the latest results with a TON more detail than last time.

I encourage you to read the entire thing, but I realize that it’s a long post. So, here’s the bottom line:

The same packages were tested again: Snap, igraph, networkx, graph-tool, networkit, and LightGraphs. NetworkX and LightGraphs were the only single-language packages tested.
Tim used the master branch of LightGraphs which will eventually become version 2.0. This uses a dispatch-based API with some significant performance improvements, especially in the multi-threading code.
Tim compared multithreading code (he did last time too, but we didn’t have a good story to tell in the 1.3 series) where it was supported in the packages.
LightGraphs tied or beat every other package tested, across every graph type, in every test except for graph loading. Where we beat a package, it was usually by a significant amount.

Here’s an example for the Google-web graph (values are times in seconds):

test	lightgraphs	graph-tool	igraph	networkit	networkx	snap
connected components	0.29	0.28	1.38	0.37	7.77	1.56
k-core	0.16	0.39	0.92	0.83	42.6	1.31
graph loading	16.75	11.02	3.87	4.38	19.24	7.56
pagerank	0.06	0.36	2.42	0.1	33.5	2.31
shortest paths	0.01	0.08	0.41	0.14	3.41	0.26

Some notable quotes:

Lightgraphs is ~300x faster than networkx on the single source shortest path problem, and approximately 10x faster than the other competitors.

Once again, lightgraphs performs extremely well on the k-core decomposition, being more than >250x faster than networkx.

The move from lightgraphs 1.x to 2.0 has also led to significant all-round performance gains with many multi-threading algorithms now available.

Lightgraphs is an anomaly over here since it’s [sic] performance is really fast even on 1 thread and relatively consistent across all settings.

I’m very excited by this benchmark study – it validates a whole ton of work the core team has put into the package over the past several months.

I’d like to conclude with a call for help: if you have a chance, please check out the GraphIO.jl package. It needs some love, particularly in performance. The Edgelist format is the one that was used here. It’s laughably unoptimized. Jump in the #graphs channel in Slack if you have questions, and please open issues and/or PRs with suggestions.

Thanks to everyone in the Julia community for your support. This is a really exciting time to be part of things. We can’t wait to share version 2.0 with you all (later this summer, with any luck). In the meantime, if you want to get a taste of the changes, check out the LightGraphs master branch. (There are deprecation warnings throughout so your old code will work but will be really noisy and slower until you migrate to the new API.)

cc @jpfairbanks @simonschoelly @mbesancon @abhinav_mehndiratta

viralbshah · May 11, 2020, 4:58pm

Dear Seth and rest of the Lightgraphs team,

Wow, this is really amazing! I always personally imagined that Julia would become a great language for combinatorial computing, and provide a great story for combining combinatorial and numerical computing. I had no idea the performance was this good already!

The other systems you are benchmarking against, seem to be written in C++, Python, etc. Do you have some intuition on what makes Julia do so well? Many of the performance gains look like they may not be simply explained with Julia’s language performance only - suggesting that there are perhaps algorithmic improvements too.

I have no doubt that with all the recent CSV.jl and IO improvements, GraphIO will catch up on the graph loading benchmarks.

-viral

anon94023334 · May 11, 2020, 5:08pm

The core code on the packages (except for networkx) is indeed C/C++. I believe we do as well or better than they do for a combination of reasons:

Our algorithms are state-of-the-art. Breadth First Search, in particular, uses a frontier-based approach that, while not novel/exclusive, is the current state-of-the-art that some packages haven’t gotten to yet.
We have more threading options available than other packages. An example: NetworKit trounced us in k-core in the last benchmark study. Turns out, they were using a threaded version, and we didn’t have one yet. However, there was an error in the benchmark in that the graph fed to NetworKit was undirected, where it was supposed to be directed. The catch this time is that NetworKit doesn’t thread its k-core code for directed graphs, but we do now, so when that benchmark was corrected, we claimed the lead.
Data structures matter. As an example, our slow load time is partly the consequence of reading in Int64s and then converting them to the smallest-width type that can accommodate the graph (in this case, UInt32). Moving from 64 bit vertices to 32-bit vertices halves the running time of most algorithms, and uses less space to boot. If we used StaticDiGraph{UInt32} instead of SimpleDiGraph{UInt32}, we’d get an ADDITIONAL 20-50% performance boost on all algorithms. (Next time!)
At its base, performance is all about opcodes (well, disregarding memory access patterns as a result of the data structures above). The Julia compiler produces code that is as efficient as its C++ counterparts. My expectation is that if the same algorithms in LightGraphs were written in C/C++, we would see identical performance.

Oh, and about NetworkX? It’s a great package – don’t get me wrong; it was what got me started in graphs to begin with – but it’s impossible to be as flexible as it is and achieve good performance, especially with the “everything is a dictionary” model that they use. LightGraphs was created with a different set of objectives: Correctness, Performance, and Flexibility, in that order. NetworkX prioritizes flexibility over performance. As for correctness, Aric Hagberg’s package is the primary tool LG uses to validate results. If NetworkX gives a different result than LG, then we go bug-hunting in our code.

erlebach · June 22, 2020, 7:18pm

I cannot find StaticDiGraph anywhere in the source code of LightGraphs.jl. Where is it? Thanks.

anon94023334 · June 22, 2020, 9:03pm

I cannot find StaticDiGraph anywhere in the source code of LightGraphs.jl . Where is it? Thanks.

It’s in the StaticGraphs.jl package.

anon94023334 · June 22, 2020, 9:03pm

I just checked them and all three are working for me.

Pei · August 18, 2021, 10:07pm

Anyone tried to run the benchmarks? I am seeing some function usage inconsistencies between lightgraphs.jl in the benchmark and LightGraphs.jl , e.g., LightGraphs.ShortestPaths VS. LightGraphs.Experimental.ShortestPaths. I did some search and found the discussion (LightGraphs users (and other interested parties): please opine below) about different ways to implement functions/modules in developing LightGraphs 2.0. But I could not find recent updates. Does anyone know what version of LightGraphs.jl I should use for the benchmark? Thanks.

Topic		Replies	Views
[ANN] New benchmarks for LightGraphs Package Announcements performance , lightgraphs , benchmarks	1	1302	February 12, 2020
LightGraphs New to Julia	6	1337	November 8, 2019
Networkx & lightgraphs.jl shortest path benchmark Graphs question , package , lightgraphs	7	4118	May 11, 2020
Best package to work with large graphs/networks New to Julia	1	633	August 29, 2017
Trying to identify possible optimizations (or errors) in a graph algorithm Graphs lightgraphs	24	1797	January 31, 2019

LightGraphs benchmark study, take 2

Related topics