I came across an interesting problem recently that can be solved with the shortest path algorithms from Graphs.jl. The only problem is that it takes a “long time” (julia scale ) to build the graph because there are 250K vertices and ~1mm edges.
In my situation, the existence of these edges can be calculated on the fly. I wonder if it’s possible to build a lazy graph without having to materialize all the edges upfront. Perhaps I just pass a function that will return
true if there’s an edge between the two vertices. My vertices are just numbered sequentially 1:n.
Does anyone know if Graphs.jl can support that or if there’s another package with such functionality? Thanks
As far as I’m aware, there is still no package for building lazy graphs (and I think it would be an awesome feature). You can totally make a new graph type extending
AbstractGraph, and use the functions of Graphs.jl.
You just need to implement all the API functions, you can check the documentation to create an alternate graph type. vertices are labelled by a subtype of
Integer. Despite what claims the documentation, you should have the neighbors functions return vertices in ascending order and the vertices must form a continuous range starting from one (each vertex must be lower than
nv(g)), otherwise some functions will not work as expected. This is especially true for
astar implementation, where vertices are directly used for indexing.
You can implement
What is wrong in the docs? Doesn’t it say “ascending order”?
Yes, but it does not speak of continous range starting from one, and ascending order is only for
AbstractSimpleGraph, so in theory,
AbstractGraph doesn’t require it, but I’m not sure how well this is respected in the codebase…
Also, if your performance problem is only in building the graph, how do you construct it? If you add every single edge one by one, this is a very inefficient way to build it, because SimpleGraphs maintain the adjacency lists in ascending order, so it does recompute a lot of things for each edge addition. Edge addition is not constant time for SimpleGraph because of it’s internal structure. You can improve this by batching edge additions, or directly building the
badjlist for directed graphs) yourself.
It turns out that my performance problem came from populating the sparse array rather than constructing the graph. I use a sparse matrix to keep track of the edge distances, and the
setindex! calls to the array took the majority of time. That’s actually a little surprising though… I will check further when I have time later.
Thanks for the ideas above and sorry about the noise
P.S. I create all edges in an array and make a single call to create the graph.
Creating an empty sparse matrix and filling it with
setindex! isn’t particular fast because the CSC format is optimised for efficient arithmetic operations and not for changes to the sparsity structure.