Retworkx: new, high-performance Python graph library

jlapeyre · November 7, 2021, 6:53pm

retworkx is a python library wrapping the petgraph Rust library. (Actually, retworkx itself has a pure-Rust layer enhancing petgraph). retworkx is a strong contender for Most Performant Python Graph Library. Development has industry support and volunteers.

This is interesting for Julia, because the library formerly known as LightGraphs usually compares quite favorably in performance to other libraries; in particular those with a Python interface. Based on the benchmarks linked above, it seems retworkx is faster. Another piece of data: I wrote the betweeness centrality function for retworkx (in the Rust layer). It was the first thing I ever wrote in Rust. But, at the Python level, it outperformed LightGraphs by 50% or so in tests I made while developing. I would like to better understand the data structures used by petgraph to see what their role is; I suspect it’s large.

The comparisons above implement solutions to the DIMACS 9 challenge. I started to implement this in Julia. But, I stalled a bit because despite some effort, I was unable to get a Julia implementation that is competitive with any of the entrants in that repo on the first step, which is graph construction. Of course, this is known to be Graphs.jl (former LG) 's weakest point. Also I did some yak shaving regarding the DIMACS file format; i.e. I might as well make a PR to GraphsIO.jl. But, 1) GraphsIO does not currently support reading attributes, so I would have to introduce a new dependency and extend the interface of loadgraph. 2) which attribute-bearing Julia structure is best for this is not clear, probably MetaGraphs. 3) DIMACS graph file format is apparently widely used in academic research, but it is unfortunately ill-specified and not versioned.

Seth gave an explanation in a Discourse post for why Graphs.jl is slow in reading graphs from files. But, in contrast, I found that, if I built only a SimpleGraph and did not use the weights from the DIMACS file, I could outperform all Python libraries except retworkx, and the latter was only 50% faster or so. I could even parse the weights and put them in an Array and be fast. I just could not put them in a graph structure without taking five times longer than (pure-Python) networkx. It seemed that the problem was the very large memory usage due to nested Dicts.

At any rate, I want to give this a bit of visibility because, aside from being an impressive library, retworkx represents a challenge to the Julia community.

gdalle · November 8, 2021, 6:41am

Hi! Thank you for the post, I didn’t know about retworkx and it looks pretty exciting!
Which structure / package did you use to create the graph? If it is MetaGraphs.jl, then I’m not surprised it was slow, and alternative packages are (more or less) in the works

jlapeyre · November 8, 2021, 12:16pm

I tried both SimpleWeightedGraphs and MetaGraphs. The former was very slow. Contstructing the latter began rather quickly, but slowed down as edges were added. The challenge graph has about 6 x 10^7 edges. I saw there is an “experimental” replacement for MetaGraphs, but did not try that.

jpfairbanks · November 9, 2021, 1:42pm

SimpleWeightedGraphs use a CSC matrix for storage so mutating them is really slow. Constructing it should be as fast as suitsparse’s routine to go from [(rowind, colind,value)] format to CSC.

CameronBieganek · November 9, 2021, 2:56pm

In a hypothetical version 2.0 of Graphs.jl, it would be nice to make the AbstractGraph interface more generic so that it is easier to use graph metadata. Right now there are various issues with attempting to use metadata or attempting to use labels for vertices that are not integers. There are some old issues in the LightGraphs.jl repo that I opened a while ago:

github.com/sbromberger/LightGraphs.jl

Proposal: Relax restriction on vertex IDs

opened 11:12PM - 24 Jun 21 UTC

CameronBieganek

discussion

I've been thinking about API design changes that would make the `AbstractGraph` …interface more generic and make it easier to create graphs with meta-data. I was thinking about something along the lines of ```julia abstract type AbstractGraph{L, V, E} end ``` where `L` is a vertex label type, `V` is a vertex (or vertex meta-data) type, and `E` is an edge (or edge meta-data) type, with some associated machinery for handling vertex labels. However, after thinking about it for a while, I think that would actually be more restrictive than what we need. I think there are basically only two things that we need in order to make the `AbstractGraph` API more generic: - Allow each concrete subtype of `AbstractGraph` to choose its own vertex label type `L`. - Some subtypes of `AbstractGraph` could allow the user to decide the type of the vertex label, for example, a graph type `NotSimpleGraph{L, V, E}`. - Add `add_vertex!`, `rem_vertex!`, `add_edge!`, and `rem_edge!` as optional parts of the API that should be implemented for mutable graphs. I think including the graph modifying verbs in the API would allow individual graphs to decide whether or not they want to use consecutive integer labels (or, of course, non-integer labels). In order to allow for arbitrary vertex label types, the signature for `add_vertex!` would have to change to this: ```julia add_vertex!(g, v) ``` I don't think that would be too much of an imposition for `SimpleGraphs`, since under the current API I have to immediately call `nv(g)` in order to figure out the integer code for the newly added vertex. So the usage pattern for `SimpleGraphs` would change from this: ```julia added = add_vertex!(g) if added v = nv(g) else # recover end ``` to this: ```julia v = nv(g) + 1 added = add_vertex!(g, v) if !added # recover end ``` Although I guess then you would need to manually check for integer overflow in `nv(g) + 1` ... 🤔 With the `add_vertex!(g, v)` version of the API, `v` is meant to be a vertex label, so this might be a stretch of the API definition, but perhaps there could be a `SimpleGraph` method like `add_vertex(g, Increment())`, which at least maintains the arity of `add_vertex!`. Or maybe `SimpleGraph`s could leave checking for integer overflow to the user. `typemax(Int64) ≈ 9.2 * 10^19`, so that seems big enough for most use cases. Then the API for `add_vertex!(g, v)` could be the following: - If `v` already exists in `g`, then `add_vertex!(g, v)` is a no-op. - If `v` does not already exist in `g` and is a valid label, then `v` is added to `g`. - For a `SimpleGraph{T}`, the only valid label is `nv(g) + T(1)`. - For a `SimpleGraph{T}`, it's possible to have `nv(g) + T(1) <= 0`, which would probably result in a `BoundsError`, but it's up to the user to worry about that. Question: With the above change to the `add_vertex!` API, would we still need to return `false` when `add_vertex!(g, v)` is a no-op? Or could we always return `nothing` from `add_vertex!(g, v)`? I realize this would be a pretty big change to the codebase. But this is a feature that I would definitely like to have, so I might be able to find some time to help contribute it.

github.com/sbromberger/LightGraphs.jl

Is a vertex newly added to an AbstractGraph guaranteed to have a vertex index |V| + 1 ?

opened 06:59PM - 14 Dec 20 UTC

closed 06:17PM - 15 Feb 21 UTC

CameronBieganek

I know that `add_vertex!` is not part of the `AbstractGraph` interface, but the …documentation seems to use an implicit assumption that when you add a vertex to a graph, it is denoted by the integer `|V| + 1`, where `|V|` is the number of vertices in the graph before the new vertex is added. Is this behavior assumed to apply to any `AbstractGraph`? Of course, the examples in the documentation use `SimpleGraphs`, so it's not clear if this assumption applies to any `AbstractGraph` or only to any `AbstractSimpleGraph`. For example, if this behavior is not guaranteed for an AbstractGraph, then the value of `added` in the code below is undefined: ```julia struct SomeGraph <: AbstractGraph # etc end g = SomeGraph(2) add_edge!(g, 1, 2) add_vertex!(g) added = add_edge!(g, 1, 2) ``` In other words, without the guarantee above, it's possible that when `add_vertex!` is called above the new vertex is given label `1` and the old vertex labels are incremented by 1, e.g. `1 -> 2` and `2 -> 3`. If that were the case, then the second `add_edge!` would be successful and we would have `added == true`, and the list of edges would now be `[Edge(1, 2), Edge(2, 3)]`. However, if the new vertex index is `3`, then the second `add_edge!` would be unsuccessful and we would have `added == false`, and the list of edges would be `[Edge(1, 2)]`. More generally speaking, it's rather difficult to keep track of vertices if we don't know what happens to vertex labels/indices when we add a new vertex.

In my opinion, add_vertex!, rem_vertex!, add_edge!, and rem_edge! all need to be added as optional parts of the AbstractGraph interface. There needs to be a generic API for those functions that can apply to any mutable graph.

gdalle · November 9, 2021, 9:41pm

Indeed you’re right, and if you look at the developer notes defining the API, you’ll see that these modifying functions are in fact optional.

As for the interface, I opened https://github.com/JuliaGraphs/Graphs.jl/issues/35 to talk about it, feel free to contribute

gdalle · November 9, 2021, 9:44pm

Have you tried building the edge arrays first and only constructing the graph once they’re complete? Something like

sources = [1,2,1]
destinations = [2,3,3]
weights = [0.5, 0.8, 2.0]
g = SimpleWeightedGraph(sources, destinations, weights)

might work better because you wouldn’t need to modify a sparse matrix for every insertion, you’d only push into an array.

jlapeyre · November 9, 2021, 9:50pm

No. It’s worth trying.

CameronBieganek · November 9, 2021, 10:00pm

Well, they’re optional in the sense that they’re not mentioned at all in the API. What I mean is that they should be mandatory for mutable graphs (which is most graphs), and a generic API for those functions should be defined for all AbstractGraphs.

mtreinish · November 12, 2021, 4:37pm

FWIW, I wrote those benchmarks primarily in support of our paper submission to JOSS: [2110.15221] rustworkx: A High-Performance Graph Library for Python I plan to work with @jlapeyre after the submission and review process is over to add Julia to that benchmark suite (and also to expand the benchmarks to cover additional use cases and algorithms)

I would like to better understand the data structures used by petgraph to see what their role is; I suspect it’s large.

For petgraph the data structure retworkx is using (because petgraph actually offers several different types) is the StableGraph struct which is basically an adjacency list. The core of the data structure is 2 Rust Vecs containing nodes and edges. The edges for a node are stored kind of like a linked list of edge indices with the first edge index being stored in the Node struct and then each edge struct stores the index of the next edge. The stable graph part is just built on the Graph struct (which I used for the links above) and basically maintains a removed node and edge list to invalidate and preserve indices on removal while a normal Graph struct will swap_remove (to avoid the left shift overhead) which changes all the indices on removal. This ends up being quite performant because at the end of the day most of the graph operations are just operating on a Vec which is very quick (and works nicely with CPU caching).

I wrote the betweeness centrality function for retworkx (in the Rust layer). It was the first thing I ever wrote in Rust. But, at the Python level, it outperformed LightGraphs by 50% or so in tests I made while developing.

retworkx will be be a bit faster now that we introduced the multithreaded implementation of betweenness centrality.

For retworkx if you’re not interacting with node weights there is almost no overhead from python aside from the python function call overhead and any argument copy and type conversion from python ->rust (which except for complicated or large data inputs that need to be copied and/or converted is on the order of 100s of ns) and potentially the same copy/convert for the return to python (but in this case it’s ameliorated by using a custom rust pyclass return that avoids the copy/convert until individual elements are accessed). This is because retworkx stores the node and edge weights as basically pointers to the Python heap, so we only interact with Python when we need to inspect a weight object. So for betweenness centrality where it’s all isolated to the structure of the graph it’s basically all native rust performance since you’re working solely in rust without needing to check node or edge weights.

For other algorithms that need to interact with weights (like edge weights in a shortest path) we have more overhead because we have to use a callback to python to get a statically typed weight from a given python object. You can see this in the benchmarks, retworkx is a bit slower for the single source shortest path and I’m like 90% sure that’s the python callback overhead.

But, I stalled a bit because despite some effort, I was unable to get a Julia implementation that is competitive with any of the entrants in that repo on the first step, which is graph construction.

For those benchmarks I wanted to be as fair as I could for all the python libs so I didn’t use a reader written in rust (which would be significantly faster) and just opened the file in Python and manually called graph.add_node() and graph.add_edge() (or the equivalents) in each python library while iterating over each line (with the exception of python-igraph which creates a networkx graph first and converts it, for whatever reason this was significantly faster than creating it directly). This way each lib had a baseline of python’s file reader performance and iterating over the file. I’ve not really worked in Julia much, but is doing normal text file io and calling the individual graph library calls not an option, I would assume Julia’s file I/O is faster than Python’s. Or is that what you were trying?

jlapeyre · November 12, 2021, 4:58pm

I guessed you did it this way because it was simple and perhaps a native reader would be overoptimizing compared to constructing the graph. I think writing the reader in rust (with a Python interface) is perhaps fair. It certainly would be if the graph format were a solid standard (it is not). I was spending some time getting the parsing as fast as I could in Julia (I posted here ) But, constructing a graph with weights swamped this time. It would be easy to test your python reader at just parsing (has the advantage of being very readable code) vs. Julia at just parsing.

Sukera · November 12, 2021, 5:03pm

Julia can call into Rust just fine (e.g. via RustCall.jl or provided the rust library gives us C bindings to call), so that should definitely work. The trouble is of course that we often want our cake and eat it too - in this case, that means having multithreading compose. While julia composes multithreading internally just fine, it doesn’t across a ccall boundary (or any FFI boundary, really). I’m guessing that’s where a lot of the desire for having a julia-native version comes from.

Other than that, we love us some speed in julia as well, so sometimes it’s just for the challenge of it

mtreinish · November 12, 2021, 5:25pm

I was actually asking about this more as an approach for using the Julia graph library, but that’s because I misread the OP and didn’t see most of the time was spent in the actual graph object construction.

That being said, I hadn’t actually thought about calling to into petgraph/retworkx from Julia to reuse the implementations there. It’s an interesting idea, and adding a native C FFI to retworkx is something I’ve had as a long term TODO see: https://github.com/Qiskit/retworkx/issues/33 Although when I opened that issue it was primarily to expose a C api to other compiled Python extensions (like numpy’s C api that cython and other extensions can use). But, now that we have a native rust library maybe just putting a more generic C api in front of that would be useful too.

zsz00 · November 13, 2021, 4:24am

what is RustCall.jl, what’s the repo url ??

zsz00 · November 13, 2021, 4:51am

I saw a comparison before:

Benchmark of popular graph/network packages v2

Topic		Replies	Views
The graphs ecosystem Graphs lightgraphs , metagraphs , optimization , graphs , metagraphsnext	35	3623	August 6, 2024
[RFC] GraphInterface.jl: An interface proposal for Graphs.jl version 2.0 Graphs lightgraphs , graphs	29	1701	October 8, 2023
Creating a Weighted Graph Graphs lightgraphs , graphs	33	5948	June 16, 2020
Networkx & lightgraphs.jl shortest path benchmark Graphs question , package , lightgraphs	7	4118	May 11, 2020
Performance of weighed graphs Graphs metagraphs , algorithm , graphs	9	1063	July 12, 2022

Retworkx: new, high-performance Python graph library

Benchmark of popular graph/network packages v2

Related topics