The core code on the packages (except for networkx) is indeed C/C++. I believe we do as well or better than they do for a combination of reasons:
- Our algorithms are state-of-the-art. Breadth First Search, in particular, uses a frontier-based approach that, while not novel/exclusive, is the current state-of-the-art that some packages haven’t gotten to yet.
- We have more threading options available than other packages. An example: NetworKit trounced us in k-core in the last benchmark study. Turns out, they were using a threaded version, and we didn’t have one yet. However, there was an error in the benchmark in that the graph fed to NetworKit was undirected, where it was supposed to be directed. The catch this time is that NetworKit doesn’t thread its k-core code for directed graphs, but we do now, so when that benchmark was corrected, we claimed the lead.
- Data structures matter. As an example, our slow load time is partly the consequence of reading in
Int64
s and then converting them to the smallest-width type that can accommodate the graph (in this case,UInt32
). Moving from 64 bit vertices to 32-bit vertices halves the running time of most algorithms, and uses less space to boot. If we usedStaticDiGraph{UInt32}
instead ofSimpleDiGraph{UInt32}
, we’d get an ADDITIONAL 20-50% performance boost on all algorithms. (Next time!) - At its base, performance is all about opcodes (well, disregarding memory access patterns as a result of the data structures above). The Julia compiler produces code that is as efficient as its C++ counterparts. My expectation is that if the same algorithms in LightGraphs were written in C/C++, we would see identical performance.
Oh, and about NetworkX? It’s a great package – don’t get me wrong; it was what got me started in graphs to begin with – but it’s impossible to be as flexible as it is and achieve good performance, especially with the “everything is a dictionary” model that they use. LightGraphs was created with a different set of objectives: Correctness, Performance, and Flexibility, in that order. NetworkX prioritizes flexibility over performance. As for correctness, Aric Hagberg’s package is the primary tool LG uses to validate results. If NetworkX gives a different result than LG, then we go bug-hunting in our code.