[RFC] GraphInterface.jl: An interface proposal for Graphs.jl version 2.0

Hi, all! I’ve created a new package GraphInterface.jl as a proposal for the new AbstractGraph interface for Graphs.jl version 2.0. I know that there is ongoing work in the GraphsBase.jl repository on the new version 2.0 interface, but my proposal differs in some respects, so I felt it would be easiest to create my own repo containing my proposal. Please let me know what you all think about the inteface proposal in GraphInterface.jl.

The GraphInterface.jl package is a very minimal package containing mainly just function stubs and docstrings, along with a few convenience methods. An example type that implements the AbstractGraph interface is given in a separate GraphTypes.jl repo. I’ve also implemented a few basic graph algorithms in a GraphAlgorithms.jl repo. The algorithms are implemented in accordance with the AbstractGraph interface. (One of the algorithms is the Dijkstra shortest path algorithm. It was ported from Graphs.jl, and thus the Graphs.jl license is included in the GraphAlgorithms.jl license.) The dependency relation among the three packages is as follows:

GraphTypes <- GraphInterface -> GraphAlgorithms

where arrows point from dependencies to dependents. In other words, GraphTypes.jl and GraphAlgorithms.jl are independent of each other. (The Project.toml file for GraphAlgorithms.jl is misleading in this regard. I got tired of fiddling with local dependencies and just added GraphTypes.jl as a direct dependency so that I could get the unit tests running.)

A few points of interest regarding the interface:

  • The interface allows vertices to be of any type. Or rather, a given graph instance will have a vertex type defined by eltype(vertices(g)).
  • Edges are iterable. To retrieve the vertices that define an edge e, use u, v = e.
  • Vertex and edge membership can be tested with v in vertices(g) and e in edges(g), respectively.
  • Vertex and edge metadata are not needed for writing generic graph algorithms, so they are explicitly excluded from the AbstractGraph interface. Graph types are, of course, free to implement their own interface for metadata if they so choose.
  • There are no optional methods in the interface. For clarity and simplicity, all types that implement AbstractGraph are assumed to be mutable. In other words, they must implement add_vertex!, add_edge!, rem_vertex!, rem_edge!, and set_weight!.
  • I only implemented an undirected graph type in GraphTypes.jl, so I haven’t really fleshed out the interface for directed graphs in GraphInterface.jl. Probably a few more methods will need to be added, like is_directed, in_neighbors, and out_neighbors.

Further details on the interface can be found in the README.md and docstrings for GraphInterface.jl.

Here is an example of the interface in action for the Graph type from GraphTypes.jl:

Example of interface implemented in GraphTypes.jl
julia> using GraphTypes

julia> g = Graph{Char}()
Graph:
    Vertex type: Char
    Number of vertices: 0
    Number of edges: 0

julia> add_vertex!(g, 'a')
Graph:
    Vertex type: Char
    Number of vertices: 1
    Number of edges: 0

julia> add_edge!(g, 'b', 'c')
Graph:
    Vertex type: Char
    Number of vertices: 3
    Number of edges: 1

julia> add_edge!(g, Edge("cd"))
Graph:
    Vertex type: Char
    Number of vertices: 4
    Number of edges: 2

julia> collect(vertices(g))
4-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
 'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

julia> collect(edges(g))
2-element Vector{Edge{Char}}:
 c -- d
 b -- c

julia> 'a' in vertices(g)
true

julia> Edge("bc") in edges(g)
true

julia> weight(g, 'b', 'c')
1.0

julia> neighbors(g, 'c')
Neighbors: {d, b}

julia> e = first(edges(g))
Edge:
    c -- d

julia> u, v = e;

julia> u
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)

julia> v
'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)

Finally, I ran a few benchmarks that compare the performance of GraphTypes.jl + GraphAlgorithms.jl to Graphs.jl. The results are summarized in the following table.

Algorithm Graphs.jl (ms) GraphTypes.jl + GraphAlgorithms.jl (ms) Slowdown Factor
Dijkstra shortest path 8.155 15.7 1.9
Graph union 6.853 24.745 3.6
Graph intersection 7.045 9.312 1.3

As you can see, the performance of GraphTypes.jl + GraphAlgorithms.jl is currently noticeably worse than the performance of Graphs.jl. But there are probably some ways to improve the performance. Suggestions and PRs are welcome.

Perhaps one way to improve the performance would be to use the equivalent of a Graphs.SimpleGraph under the hood, and then add two methods to the AbstractGraph interface:

  • coded(g::AbstractGraph): Returns an object that satisfies the immutable portions of the interface (but not add_vertex!, rem_vertex!, etc) and has integer vertices from 1:nv(g).
  • mapper(g::AbstractGraph): Returns a bijection that translates back-and-forth between vertices (which are of type eltype(vertices(g))) and integer vertex codes (from 1:nv(g)).

That way we keep the generic interface of GraphInterface.jl, but under the hood algorithms can still use simple integer coded graphs with vertices from 1:nv(g). I think that approach would preserve most of the performance seen in Graphs.jl.

15 Likes

May I ask you to elaborate on the difference between your proposal and GraphsBase?

1 Like

Thanks for your work @CameronBieganek !

I’ll do my best to outline the differences based on both project descriptions.
Keep in mind that GraphsBase.jl and GraphInterface.jl are just preliminary proposals (at least that’s how I view them), so in the end I hope there will be just one package which does whatever the community thinks is best.
There are plenty of open GraphsBase.jl issues you can weigh in on, in addition to the current thread.

TLDR: GraphInterface.jl sounds cool, and I’m excited to look at the code! However, there are some key aspects that GraphsBase.jl aims at tackling:

  • support for metadata (to decrease user confusion)
  • support for multigraphs (which makes sense e.g. for transportation, where you can have various edges between 2 nodes)
  • interface with optional parts, formally specified and with tests (I’m betting on Interfaces.jl to help us get there)

What do people here think?

GraphsBase.jl will probably contain an interface and base types.
The idea is to offer at the very least SimpleGraph, SimpleWeightedGraph and MetaGraph in one place, hoping that the overhead of these struct definitions is fairly negligible.

Same in GraphsBase.jl, that is indeed one of the main goals.

Possible in GraphsBase.jl, although edges can be arbitrary objects.

Possible in GraphsBase.jl, but we would like to try and handle edge multiplicity, which raises the question of how to check if two edges are “equal”.

Strong disagreement 1: last time I checked there were 15 different packages for graphs with metadata, which is immensely confusing for the users. I think the only way to prevent that is to offer a common interface and a solid default MetaGraph type.

Strong disagreement 2, some graph types simply do not support vertex addition (think of GridGraphs.jl). The question of interfaces with optional parts is an unsolved one at the moment, but I’ve been working closely with @Raf on his package Interfaces.jl, which will soon be usable for graphs.

GraphsBase.jl takes directedness into account but the best mechanism for traits-based dispatch is not yet clear. In addition, the question of directed vs undirected edges has never been handled consistently.

That is also the main worry for GraphsBase.jl, and a good first step would be to clean up the Graphs.jl benchmarks. We also need to define the basic algorithms that we want to test for regressions in a 2.0 interface (cause we won’t be able to test them all while we iterate on the design).

That is an interesting suggestion! There was an old branch of mine on MetaGraphsNext.jl which took an atlernate route:

  • keep all algorithms defined on integers
  • add a macro to be able to call them and retrieve their results with actual vertex objects
dijkstra(g, 1, 2)
@labels dijkstra(g, 'a', 'b')
6 Likes

Yep, I’m definitely not trying to splinter the ecosystem here. I have another side project that I want to work on, so my goal is to present the proposal in GraphInterface.jl, discuss it with the community, and then fade away and let the Graphs.jl maintainers do what they think is best. :slight_smile:

When I started developing GraphInterface.jl, I wanted to design it in such a way that it could cover both multigraphs and hypergraphs. But eventually I decided that it was too tricky to come up with a simple interface that could handle all the various possible graph types, so I decided to stick to simple graphs for the time being.

@gdalle Questions for you:

  • Are you looking for an interface that can support hypergraphs?
  • Wikipedia notes two notions of multiple edges: “edges without own identity” and “edges with own identity”. In other words, edges between u and v can be either distinguishable or indistinguishable. Which of those two notions are you planning to support in Graphs 2.0? Or both?

GraphInterface is already pretty close to supporting multigraphs with distinguishable edges. For example, you could create a multigraph type where you can do add_edge!(g, Edge('a', 'b', 2)) to add an edge between 'a' and 'b' with edge ID 2. And then you can do Edge('a', 'b', 2) in edges(g) to test for edge membership. However, to truly be consistent, the interface would need to disallow method signatures with u, v, e.g. add_edge!(g, u, v), rem_edge!(g, u, v), weight(g, u, v), etc would have to be disallowed, because those forms know nothing about the edge ID. (add_edge!(g, u, v, w) is currently taken and means to add an edge with a weight, not an ID.)

I dislike src and dst methods on edges, because the names “source” and “destination” don’t really make sense for undirected graphs. Furthermore, if we want to allow undirected hypergraphs, there is no sensible way to divide the multiple vertices connected by an edge into “source” and “destination” categories. The one place where something like src and dst seems necessary is directed hypergraphs.

I totally agree that some graph types are naturally immutable. My assertion in GraphInterface that all subtypes of AbstractGraph are mutable was based more on convenience and on where the interface and example implementation happen to be at the moment.

That being said, immutability does raise some difficulties. For example, in GraphAlgorithms, I implemented union and intersect by using empty, add_vertex!, and add_edge!. One could build up the union/intersection of vertices and nodes in a separate data structure and then instantiate a new graph, but that would typically be more memory intensive. Also, that would require adding to the interface a rule for graph constructors—something like this:

Each graph type, `T`, must provide a constructor with the signature
`T(vs, es)`, where `vs` is an iterator of vertices and `es` is an
iterator of tuples that represent edges.
2 Likes

In my defense, for most of the time period where I was developing Graph[Interface/Types/Algorithms], the initial PR in GraphsBase used the old add_vertex!(g) style, which seemed to preclude using an arbitrary vertex type.

In fact, I also created my own metagraph type a few years ago. Here’s a demonstration of my preferred interface for metadata:

# User defined vertex metadata type.
# Could use a named tuple instead.
mutable struct VertexData
    a::Int
    b::String
end

# User defined edge metadata type.
# Could use a named tuple instead.
mutable struct EdgeData
    c::Float64
    d::String
end

g = Graph{Char, VertexData, EdgeData}()

# ... Add vertices, edges, and metadata to graph.

# Access vertex metadata:
g[u].a

# Access edge metadata:
g[e].c
g[u, v].d

# Set vertex metadata:
g[u].a = 42

# Set edge metadata:
g[e].c = 3.14
g[u, v].d = "hello"
1 Like

No doubt on my mind :wink: And I genuinely appreciate the proposal!
Pinging @simonschoelly and @etienne_dg in case they wanna take a look too (although the latter is prepping for his PhD viva, best of luck to him!!!).

The initial goal of the interface as designed by Etienne was to support multigraphs (which can have multiple identifiable edges between each pair of vertices) but not hypergraphs (where an edge is a set of vertices with any cardinality)
That is why the basic functions are outedges and inedges instead of outneighbors and inneighbors.

In the last community meeting, it surfaced that multigraphs seem relatively easy to support in the current state of affairs with a minor hack to Graphs.jl (I believe it was @filchristou who pointed it out). Therefore it seems reasonable as an objective, but it already requires rethinking many algorithms (typically a path is now a vector of edges and not just a vector of vertices).

However, hypergraphs are an even bigger challenge and I don’t really know if we can tackle it. Feel free to give your opinion in the corresponding issue.

You and me both. I think a method like other_vertex: (e, u) -> v would make more sense, and be easier to generalize. But I don’t know how badly it affects the codebase. Again, there is an issue for that to discuss further.

Is there a particular reason not to just declare in the interface that edge types must be iterable? The syntax u, v = e seems pretty clean and elegant to me, and it avoids calling one vertex the source and one the destination.

You’re right, at the moment many algorithms rely on mutation (or only work for SimpleGraph), as Simon found out when he heroically switched many tests to a GenericGraph object with minimal properties.

In theory, nothing stops us from implementing intersect without mutation, but in practice it’s easier when we don’t know what the constructors for the concrete graph type look like.
I like your idea of forcing the existence of a constructor from an iterable of vertices and one of edges (plus metadata cause that’s my jam).
Another elegant approach is to have non-allocating graph views for such operations.

No need to defend yourself! It’s true that GraphsBase.jl is the draft of a draft at best, but discussions such as the one you started are exactly how we’re gonna improve it.

I kinda like it :slight_smile: But the most important thing is to agree on one syntax and stick to it. There’s more leeway in this department because as you said it doesn’t show up in the core algorithms, so the differences between the 15 packages I mentioned are mostly matters of taste.

2 Likes

A topical meme I made for my talk this week at the Julia and Optimization Days in Paris:

4 Likes

I like the simplicity of GraphInterface.jl. The design is similar to my attempt at a graph type with labelled/named vertices in NamedGraphs.jl. It’s based on a bijection between vertices and internal integer vertices like your proposed mapper(g::AbstractGraph) function (I use an interface based around functions vertex_to_parent_vertex(g, v) and parent_vertex_to_vertex(g, v) to map between the outer vertices and the vertices of the parent graph, which would generally have linear indexing like SimpleGraph from Graphs.jl).

I think this is closely related to efforts in the Julia ecosystem around generic indexing of arrays, for example DimensionalData.jl and AxisKeys.jl. I think we could take inspiration from those packages around defining generic indexing, using the analogy that a SimpleGraph is like a SparseMatrix, so a graph with labelled vertices is like a KeyedArray or DimArray wrapping a SparseMatrix allowing more general indexing. See `Dictionary` that can be indexed either with a key or with an integer · Issue #113 · andyferris/Dictionaries.jl · GitHub for a related discussion on an interface for directly accessing the inner data, circumventing the outer labels.

As @CameronBieganek alluded to in the first post, I think there are 3 main ways of making graph functions work with general vertices:

  1. assume generic vertices but make no other assumptions, in which case it will likely involve a lot of mapping back and forth between vertices and parent vertices of some wrapped SimpleGraph within the code and be slower than necessary, but it will “just work” if there isn’t an underlying data structure with linear indexing,
  2. wrap and unwrap the parent graph/vertices as pre- and post-processing, which requires a generic interface for wrapping and unwrapping graphs/vertices, or
  3. allow some interface for directly accessing an underlying graph structure in a fast way with linear indexing (which probably also requires an interface for rewrapping/mapping the parent vertices, since many graph functions output lists of vertices or edges, so is closely related to 2.).

It seems like a design strategy could be to have a trait analagous to Base.IndexStyle which defines what kinds of indexing the graph structure has, with fast codepaths if it has linear indexing (either externally or internally, in which case there could be an interface for linearly indexing into a labelled graph where that is possible, say with a specialized type LinearVertex(v) or OrdinalIndex.jl). I could imagine that most of Graphs.jl code could still work on labelled graphs if they were rewritten slightly to access linear “parent vertices” when those are accessible. It could go a long way to turn current Graphs.jl code into trait functions specialized on an IndexLinear trait.

I responded about metadata in Location and access for metadata · Issue #14 · JuliaGraphs/GraphsBase.jl · GitHub.

2 Likes

@gdalle I think it is ok if there is a plethora of metagraph types, I think that is inevitable since different applications require different data structures to be fast (like how there are an endless number of Julia AbstractArray subtypes at this point). I think the key is:

  1. The AbstractGraph interface and functions allow many of them to be used in standard graph functions, especially ones that don’t require metadata (i.e. ones that just need to know about the structure of the graph connectivity through functions like neighbors), if they overload some minimal number of interface functions.
  2. Have a solid standard metagraph implementation that is very general and as fast as possible given that generality that users can “just use”. This can act as a common fallback data structure that other metagraphs can be converted to (analogous to Julia’s Array type), or used as a fallback data structure if users didn’t define constructors like empty or similar for their own metagraph type, say in functions that return new graphs like when merging two metagraphs.

Also we could consider having a trait MetaDataStyle with subtypes NoMetaData, VertexMetaData, EdgeMetaData, VertexEdgeMetaData, etc. which specializes certain functions that might need to access metadata (like merging graphs), which could make use of generic graph metadata accessor functions but ignore them if the graph doesn’t have metadata, for example if it is a SimpleGraph.

1 Like

This is just an oversight and rather easy to add. But I don’t see how it solves the problem for undirected edges?

It’s a good summary!

At the moment I think GraphsBase.jl and GraphInterface.jl do 1, and we could add 2 without much sweat.
The trouble with 3 is that outneighbors / inneighbors (or outedges / inedges) are not necessarily indexable cleanly, or even at all. Some graph types generate them on the fly, and forcing indexation would be an additional interface constraint. But if we can manage that, I agree that it sounds like the best of both worlds.

Etienne’s original idea in GraphsBase.jl was to have a create_vertex_container(g, T) method that generates the best container with values of type T and keys of vertex type. That, way when Dijkstra wants to e.g. store vertex distances, it can do so in a Dict (default fallback) or in a Vector (if the vertices are contiguous integers) depending on what the graph type supports. This trick allows at least integer-based graphs to be fast, instead of everyone using dictionaries everywhere.

I opened an issue to discuss these integers-to-labels aspects.

That sounds like a good idea. If there was a VertexStyle/IndexStyle trait the container could even be defined automatically based on that trait (i.e. LinearVertices could use Vector and GenericVertices could use Dict by default).

Requiring a method from the user or requiring a trait doesn’t change much in the grand scheme of things.
But what changes a lot is what we index stuff with in the algorithms themselves:

  1. the vertex object
  2. an explicit vertex index
  3. an implicit vertex index given by the memory layout of vertices(g) or neighbors(g, u)

Another issue with 3 is consistency: the index of v in vertices(g) is not the same as its index in neighbors(g, u). I shudder to think of the headaches this would cause.


More generally, I’m also worried that we might make the interface way too complicated, especially by adding lots of traits (which also mess with the readability of stack traces).
If we disregard efficiency considerations, I think something like this could actually be the minimum required set of methods (names are provisional).
Nearly everything else can be defined based on those:

  • vertices(g)
  • edges(g)
  • edges(g, v): edges linked to a vertex v
  • neighbors(g, e, v): vertices reachable from v through the edge e
  • data(g, v) and data(g, e)

The more requirements we add, the harder we make it to understand the code or extend it.
That’s also why I like option 2 for vertex to integer translation: pay a cost once at the beginning of an algorithm and at the end, but in the middle it’s business as usual.

1 Like

Agreed that traits shouldn’t be overused, i.e. they should only be required for efficiency but not for the minimal interface. The reason for my suggestion was just to avoid having an extra interface function create_vertex_container if an existing trait could suffice, and a trait defining the indexing/vertex style would be useful elsewhere.

Agreed that those functions you list capture the gist of a minimal interface, but then for efficiency it seems like we also need more information about how to work with vertices in a fast way based on the indexing style of some underlying data structure, and therefore need some information about mapping between the external vertices and internal fast vertices.

Sorry, which problem are you referring to? (Honest question.)

I admit the difference between u, v = e and u = src(e); v = dst(e) is mainly cosmetic. But I think conceptually “source” and “destination” don’t really make sense for undirected graphs, and they don’t work at all for undirected hypergraphs.

For an undirected hypergraph, one could get the vertices connected by an edge via iteration:

u, v, w = e

Though I tend to agree that supporting hypergraphs in the AbstractGraph interface might be too ambitious.

2 Likes

I really like the simplicity of this interface.

I have thought about this problem at length in the past and concluded that I really like the following idea:

  1. Force every graph type internally to carry a 1:nv(Γ) structure
  2. If you like an arbitrary labelling, this can be carried in an additional vector Vector{LabelType} and we carry Dict{LabelType, T <: Integer} for the inverse mapping.
  3. If you wish to have meta-data on the vertices, you can carry it in a vector Vector{MetaDataType} with a corresponding dictionary.
  4. Edges are iterable and destructurable into u, v = e or u, v, mult = e as mentioned above. The interface need not specify whether this is sorted, and we can leave it to the implementation to do so. Note that the interface would have a function neighbo(u)rs(v).
  5. Metadata on Edges? Same approach as (3).
  6. Traits are essential for IsDirected, IsMultigraph

The main trade-off is for Graph Unions and Intersections, which is a potentially painful operation (one need to compare the LabelType basically).

However, the advantage is that all the algorithms can be made efficient. If people dislike the lack of generic-ness internally, there is no reason for the internals to be expose to a user. We could easily have a wrapper that makes the user-facing behaviour nice. One may argue that there are certain algorithms that really want the idea of vertex/edge-contraction. But an intermediary of an algorithm need not be stored.

The other potential issue is that you might think vertex deletion is very painful. However, that isn’t the case. One just have to rename the |V(Γ)|-th vertex to the deleted vertex. The cost is d(v) where v is the |V(Γ)|-th vertex. (Recall that in the user-facing world, its merely a dictionary update).

3 Likes

I was referring to the fact that for an undirected graph, both u, v = e and v, u = e are possible

Well, in that example, u and v are just variable names, which are immaterial. The important point is the order of iteration. For example, Edge('a', 'b') iterates 'a' and then 'b'. But all we have to do is add a note to the docstring for AbstractUndirectedEdge that says that the order of iteration is undefined. Then the behavior will be analogous to sets:

julia> foreach(println, Set([1, 2]))
2
1

In other words, both Set and Edge are iterable, but the order of the elements in the iterator is undefined.

Set has the following note in its docstring:

The order of elements in a Set is an implementation detail
and cannot be relied on.
2 Likes

Maybe I am misunderstanding what you are saying, but forcing every graph type to have vertices labeled indexable by 1:N seems quite restrictive to me. Consider the example of some game modelled as a directed graph where the vertices are states of the game and edges are given by allowed moves. In this setting it might be very hard to even compute the number of vertices (or it might be infinite) but still it makes total sense to apply e.g. Dijkstra or other search algorithms.

That’s why I like @gdalle 's minimal interface. It really captures the bare minimum a graph data structure needs to do. I don’t think that mapping to integer vertices is possible in general so I don’t think we should require it in general. We should offer alternative, fast code paths when a graph either is indexable by integers or mappable to another graph that is. We could do this either via trait (more flexibility but more complexity) or via a different abstract type.

2 Likes