[RFC] GraphInterface.jl: An interface proposal for Graphs.jl version 2.0

Supporting it directly might be ambitious, but thinking forward so that an extension package would integrate easily and support them would be great.

My skewed view in network science seems to indicate that higher order interactions are becoming more and more popular.

My 2-cents.

I didn’t like the focus on mutable operations in GraphInterface.jl, but I liked the other bits!

Constructing graphs a vertex/edge at a time is typically a recipe for performance problems and/or multithread contention problems. So I’d rather see “batch” operations in the GraphInterface.jl in the requirements. (Batch to one at a time is easy to implement, but the reverse is hard.) But I’d more strongly suggest a split between mutable graphs and algorithm interface wrappers…

I’d also encourage an interface that makes it easy to convert among graph types. In my experience, the -one-graph-type-to-rule-them-all- is impractical and you always end up converting at some point. (in-neighbors to out-neighbors for various algorithms… add complex metadata for flow algorithms… add specialized weights for more complex implementations). Hence the meme.


Here were some notes I typed up a few years ago when I was thinking about graphs in Julia. These are in no way complete It was a list of ideas to think about. This is heavily focused on non mutable graphs (but that may wrap underlying mutable containers). Never really got around to doing anything with them. Figure there are better here than on my computer. My apologies if these ideas are now duplicative of others (who likely proposed similar and better ideas). There were definitely lots of inspiration for it (e.g. old BGL ideas) and other graph libraries in Julia and elsewhere. (e.g. these are similar to @gdalle’s notes above.)


GraphViews.jl

- AbstractGraphView{}

- AbstractIndexedGraph <: AbstractFiniteGraph <: AbstractGraph
                          edges(G)               neighbors(G,i) - ?


graphview(df::DataFrame, src=:col1, dst=:col2)
graphview(edgelist::Vector{Pair{X,X}})
graphview(edgepair::Pair{Vector{X},Vector{X}})
graphview(A::SparseMatrixCSC)
graphview(src::Vector{X}, dst::Vector{X})
graphview(src::Vector{X}, dst::Vector{X}; )

graphview(NF::T) where T <: AbstractNeighborFunction

graphview(NF::Function, )

edgesview(data...)
neighborsview(data...)


neighbors(G,i::)
edges(G)

summary(G), census(G), ? 

----

Lots of good ideas for graph data in the Meshes.jl

meshdata()... -> 
graphdata(...) -> 


----

connected components
betweenness centrality
layout
graph eigenvectors
maxflow

-- pagerank
-- 

collect(breadth_first_search(Network(data)))

pagerank(Network(data))

pagerank()

apsp()

implicit_apsp

x = DynamicNetwork()


data -> algorithm -> output

GraphBase -> Setup for Vector{Pair{X,X}} as a graph. (Restrictions on X?)
Setup for 
Vector{Pair{X,X}}: EdgeListGraph
Pair{Vector{X},Vector{X}}: EdgePairGraph as graph? 

Setup for Pair{DataFrame, DataFrame} as a graph. 

Tuple{}

vertex_census()
edge_census()
graph_census() # vertex,edge census + connectivity


incidence_matrix() --> Sparse, VertexMap 
adjacency_matrix() --> Sparse, VertexMap


GraphPipelines # lazy strategies
GraphData # I/O: Snap, GML, Matrix Market, EdgeList, CSV, AdjList, 
GraphDataStructures # EdgeList, Adjacency, 

collect(G) # gives a list of vertices
collect(edges(G)) # gives a list of edges
collect(edges(G))

eltype(G) # vertex type


---

Step 1: Load a graph.
- GraphData # I/O: Snap, GML, Matrix Market, EdgeList, CSV, AdjList, Metis, ... 

using GraphData/GraphIO
edges = load("file.edges"; format=fmt"SNAP") 


---

Step 2: What is GraphBase?

In Julia:
Images are Matrix{X} where X <: "Pixel" 

What's a graph?
An array of edges. 


Vector{Pair{X,X}}: EdgeListGraph #edges are "columns" 
Pair{Vector{X},Vector{X}}: EdgePairGraph as graph?  # edges are row...

vertex_census()
edge_census()
graph_census() # vertex,edge census + connectivity

collect(vertices(G)) # gives a list of vertices
collect(edges(G))    # gives a list of edges

# degree(G, i::X) = mapreduce(...)
degree(G, i::X) = 
neighbors(G, i::X) = ?



---

GraphDataStructures # EdgeList, AdjacencyList..., SetOfSets... 

---

GraphAlgorithms


connected components
betweenness centrality
layout
graph eigenvectors
maxflow


--- Some specific scenarios (These are new based on the post here...) 

Translation among graph formats.
--> You will always have to translate at some point.

Make it easy to run simple algorithms on simple types.
e.g. the following should be usable as graph objects
Dict{Int,Set{Int}}()
Vector{Set{Int}}()
Vector{Vector{Int}}()

with 

vertices(G::Dict{Int,Set{Int}}) = keys(G) # or something similar
vertices(G::Vector{Vector{Int}}) = eachindex(G) # or something similar 
vertices(G::Vector{Set{Int}}) = eachindex(G) # or something similar 

but also implicitly defined graphs. 

struct LineGraph
  start::Int
  end::Int 
end 
vertices(G::LineGraph) = start:end
function neighbors(G::LineGraph, i)
  if i < start
    return nothing
  elseif  i > end
    return nothing 
  elseif i == start
    return (i+1, )
  elseif i == end 
    return (i-1, )
  else
	  return (i-1, i+1)  
  end 
end
edges(G::LineGraph) = GraphInterface.forall_vertices_get_neighbors(G) # this could be a simple implementation. 

Sometimes you don’t have a choice and are forced to build up a graph dynamically. For example, a few years ago I wrote some code to iteratively build up a citation network by querying the Scopus API.

That’s an interesting example. I had hoped that most graph types could provide coded and mapper methods, as proposed in my OP, so that algorithms could take the fast path, but I guess for the scenario you describe (and for infinite or size unknown graphs in general) there is no choice but to take the slow path. And that means that every algorithm in Graphs.jl would need separate code for the fast path and the slow path, which could double the size of the code base unless there are non-awkward ways to combine the fast path and slow path into the same generic method.

…It’s hard to develop a simple and performative interface that satisfies every use case…

For context:
I haven’t done any performance profiling, but I believe that in my shortest_path benchmark the main reason the slow path is slower than the fast path is not because of the difference in the underlying graph data structures (GraphTypes.Graph vs SimpleWeightedGraphs.SimpleWeightedGraph) but due to the fact that the generic algorithm (the slow path) uses auxiliary dictionaries for dist, parent, and visited, whereas the fast simple graph algorithm uses auxiliary vectors for dist, parent, and visited. So the generic algorithm is paying the insertion cost of the sets and dictionaries.

would need separate code for the fast path and the slow path

phrased this way it sounds hacky, but tbh I think could be somewhat naturally expressed with some kind of ismaterializable trait. each graph “is” then either

  1. a sparse matrix (can hold vertices as index range)
  2. a myopic state machine (must query / generate edges on each new vertex)
1 Like

I do agree it’s restrictive, but I think not forcing this leads to slow code for everything else. I think we’re all slowly coming into the same realisation that perhaps it is not the smartest for everything to have the same interface.

By the way, your use case is very different, as exploring a vertex modifies the graph, which I believe makes all the generic algorithms easy to make mistakes. Moreover, there is no guarantee the choice of a graph algorithm makes it run in finite time.

I think I’ll make my own graph interface :stuck_out_tongue: with my proposal. We can fuse it together with a trait or some subtyping in this case.

Graph
-> FastGraph (1:V(G) indexing as per my notes) {Abstract Type}
-> GraphTypes.Graphs

With the generic Graphs interface as per Cameron’s method, and the FastGraph subtyping of my proposal. Then the generic code can run on whatever they please, while all the fast algorithms for the restriction require the FastGraph abstract type.

I just wanna point out that while everyone is free to design their own interface, it is only valuable if there are lots of algorithms that can exploit it. At the moment, the main source of algorithms is Graphs.jl, and that seems unlikely to change.

Given the lack of maintainer time compared to the size of the codebase, a complete algorithmic rewrite is out of the question. This means we need existing code to work with minor adaptations. In fact, rethinking the interface is already a major undertaking, that may even prove too breaking or time-consuming to complete.

It’s awesome that this topic has sparked a lot of discussion, both here and on GitHub. But I’m begging you: if you have energy to spare, use it to try and find a common ground that is both consensual and achievable. That might require compromising the ideas which would have gone into a solo project, but you get to reap the benefits of a whole ecosystem in return.

6 Likes

Yeah, that’s a good heuristic for determining what should be in scope for the 2.0 interface and what should be out of scope. After a quick scan of a few algorithms in Graphs.jl, it does appear that many of them assume that the input graph is finite and has a known size. So, it’s probably best to exclude infinite graphs and graphs of unknown size from the 2.0 interface. It might even be a good idea to continue to exclude multigraphs in the 2.0 interface. Perhaps multigraphs can be accommodated further down the road in Graphs v3.0? :sweat_smile:

If we assume graphs are finite and have a known size, then the coded and mapper approach seems like a win-win. It mostly maintains the current performance and it minimizes the amount of changes needed to the existing code base. Most algorithms would only need to add some minor pre- and post-processing code.

1 Like

You’re right, graphs of unknown / dynamic size are out of scope. Multigraphs are already a challenge.

1 Like

If hypergraphs make it to the Graphs 2.0 interface, then multigraphs are easily implemented by representing its dual. I think I will write my proposal in the corresponding GitHub issue.