Graph algorithm for Lightgraph

AlexanderChen · January 21, 2021, 8:11am

Hi,

below is a graphical representation of 3 SQL Tables with their corresponding columns. And their intertable connections.

In red circles are four columns of two different tables that are not directly connected with each other.

I am looking for an algorithm that takes a set of vertices as input (red cicles) and then gives me the shortest path possible that links the input vertices. In this case:

C3.2->T3
C3.1->T3
T3->C3.4
C3.4->C2.1
C2.1->T2
T2->C2.3
C2.3->C1.5
C1.5->T1
T1->C1.1
T1->C1.2

I realize that especially the first 2 and the last 2 are not perse what you would expect because they branch inward (or outward) but any algo that comes reasonably close to this is already a great help!

healyp · January 21, 2021, 10:13am

I don’t think there are any specialized algorithms for multiple start / end points. Even if there were AFAIK the time complexity of the best all-pairs shortest path algorithm isn’t a whole lot better than the naïve alg of running SP on all C(|V|, 2) start- end-vertex pairings – maybe a log factor in the difference.

How large (relative to the vertex set) do you expect your circled vertices to be? If it’s a small fraction then you would probably be as well off to go with the naïve strategy above.

AlexanderChen · January 21, 2021, 11:08am

… well that isn’t good news .

would it help if I take advantage of the structure itself. e.g. instead of input of 4 columns I only give T3 as start and T1 as end (every column is connected to a Table)? This will of course give me only part of the story but at least something to start out with.

Where can I find information on designing an algorithm for graphs specifically for julia or in python?

thanks,

FPGro · January 21, 2021, 12:39pm

It would help to know a little more about the structure of your graph in the worst case.

is it always a tree or can there be multiple connections between Ts
are your input vertices always leaves (have only one connection) or could you, if we take your example, start with something like ( C2.1, C1.5 ) as input?
are there weights associated with anything or is it just (connection or no connection)

GunnarFarneback · January 21, 2021, 12:49pm

If I understand your problem correctly, the term you should search for is Steiner Tree.

AlexanderChen · January 21, 2021, 4:12pm

Hi,

is it always a tree or can there be multiple connections between Ts —
The table/column structure will always be a star_graph meaning 1 point in de middle which is the table and n columns (n>1). But it could be that 1 table has connections with multiple other tables but it must at least have 1 connection otherwise its disjoined.
are your input vertices always leaves (have only one connection) or could you, if we take your example, start with something like ( C2.1, C1.5 ) as input? —
Good question, yes it should be possible to use a leave that is simulatenously also a ‘bridge’ between its own table and another.
are there weights associated with anything or is it just (connection or no connection)—
No, there are no weights and the edges a bidirectional.A weigh could be added but they would all be the same weight since there is no distinction (cost difference) between them

I hope i understood your question correctly please tell me if i missed the mark!

AlexanderChen · January 21, 2021, 4:16pm

Hi @GunnarFarneback ,

The general wikipage of ‘steiner trees’
did not give me enough to make the connection between my current problem and that concept. Could you explain the connection to me?

best,

GunnarFarneback · January 21, 2021, 4:32pm

The minimum Steiner tree problem in graphs means to find the smallest tree that connects a given set of terminal nodes, which seemed to be what you wanted to find.

FPGro · January 22, 2021, 12:09am

Yeah What I meant was rather if circles are possible, like T1>C1.2>C2.1>T2>C2.2>C3.1>T3>C3.2>C1.1>T1

Gunnar did point you into the right direction. This also tells you, that this problem is NP-complete in the general case. (if your case is general or sufficiently structured is still under question)

Nontheless, this means nothing but that you don’t want to solve it exactly for big, complex problems. You should be fine with any of the approximaton algorithms for that problem.

So, anyways, what you want to do is (probably) the following (disclaimer: I’m not a graph theorist):

reduce your problem size as much as possible:

you can safely drop all vertices with 1 edge that are not your “starting points”, they can never minimally connect other vertices and thus can not appear in the solution
think about further reductions: is every connection between tables of the form T1>C1.x>C2.y>T2 without any side branches? Or can something like this exist:

q1786×511 8.65 KB

And if it did, would this imply:

q2804×481 8.84 KB

And what shall the cost of “traversing” this be? Could be 3, but it may be just as cheap having 3 of the interlinked tables as having 2 or 4. (This depends on your definition, is every edge/link costly or do you just want to minimize the amount of inter-table connections?)
Depending on that you may find more powerful reductions, e.g. it may be possible to contract each inter-table connection to a single vertex, or a single edge! If you can reduce your problem size sufficiently, the choice of algorithm is probably near-irrelevant, it’s rare that you’ll link 100+ SQL tables after all, so your worst case problem may still be trivially small with a “costly” algorithm.

Depending on the topology that is allowed (will those diagramms contain circles or not, that’s really the point), you eitehr pick one of the algorithms from the wikipedia page or a different one from that field and apply it liberally.
Or - if these are always tree-like (cycle free) - a simple dfs will give you an optimal solution in acceptable time.

AlexanderChen · January 25, 2021, 12:21pm

@FPGro & @GunnarFarneback Thank you both for the good feedback just a couple of answers to open questions concering FPGro.

Yeah What I meant was rather if circles are possible, like T1>C1.2>C2.1>T2>C2.2>C3.1>T3>C3.2>C1.1>T1

Gunnar did point you into the right direction. This also tells you, that this problem is NP-complete in the general case. (if your case is general or sufficiently structured is still under question)

yes a circle path is technically possible.

reduce your problem size as much as possible:

you can safely drop all vertices with 1 edge that are not your “starting points”, they can never minimally connect other vertices and thus can not appear in the solution

think about further reductions: is every connection between tables of the form T1>C1.x>C2.y>T2 without any side branches? Or can something like this exist:

q1786×511 8.65 KB

And if it did, would this imply:

q2804×481 8.84 KB

yes, I was thinking along similar lines. Once you know the input nodes that need to connect one could construct a temp graph that only features the tables vertices, the input vertices (both starting and finishing) and the ‘join vertices’. I was thinking that I could go one step further and take out temporarily the input vertices and exchange them for the tables on which the input vertices are connected. This way I only have the ‘tables’ and ‘join’ vertices.

Yes, it is implied that if the first picture is possible then the second picture is possible. But until that edge is made by the “graphista” for a lack of a better term, it is not possible to travel that edge. So there is no knowledge graph kind of reasoning. Its pretty plain in that respect.

And what shall the cost of “traversing” this be? Could be 3, but it may be just as cheap having 3 of the interlinked tables as having 2 or 4. (This depends on your definition, is every edge/link costly or do you just want to minimize the amount of inter-table connections?)

I want to keep the inter-table connections to a minimum so I will have to start thinking about a weighted graph. Not sure which weight should be given.I have to visualize that better to understand it.

FPGro · February 6, 2021, 9:35am

Hey, just wanted to ask how far you came with your approach, and if it would be worthwhile to to share it?

AlexanderChen · February 6, 2021, 1:00pm

Hi @FPGro,

The approach above is want I am currently pursuing. At the moment I am experimenting how to best transform the input of columns into a list of their tables. So if I give it C2.1 it gives me T2 back. I wanted to see if I can find a pattern in the Matrix() that I can transform into an algorithm. Because of the star_graph structure there is a clear pattern but I am not convinced i can find an algo that will always work. So I am thinking about using Metagraphs but I need to be able to ‘query’ on Metagraph levl, I will need to know more of the package to understand if that is feasible.

I also have concluded that I need weights on my edges but that is not a big challenge.

FPGro · February 6, 2021, 1:15pm

Sounds good. I just searched out of curiosity, and there appears to be an approximate minimal steiner tree algorithm in the LightGraphs package already so if you can transform your graph in a suitable way and determine the weights properly, you can probably use that. Good luck!

github.com

sbromberger/LightGraphs.jl/blob/ce533729381a6c0b0d8a1eb5f0c3e058833cb55b/src/steinertree13/steiner_tree.jl#L43


      
          The minimum steiner tree problem involves finding a subset of edges in `g` of minimum weight such
          that all the vertices in `term_vert` are connected.
          
          
`t = length(term_vert)`.
          
          
### Performance
          Runtime: O(t*(t*log(t)+|E|*log(|V| ))
          Memory: O(t*|V|)
          Approximation Factor: 2-2/t
          """
          function steiner_tree end
          
          
@traitfn function steiner_tree(
              g::AG::(!IsDirected),
              term_vert::Vector{<:Integer},
              distmx::AbstractMatrix{U} = weights(g)
              ) where {U<:Real, T, AG<:AbstractGraph{T}}
          
          
    nvg = nv(g)
              term_to_actual = T.(term_vert)
              unique!(term_to_actual)

AlexanderChen · February 6, 2021, 2:41pm

thanks! I thought it did not exist so i sortof resigned myself in having to figure it out on my own

FPGro · February 6, 2021, 3:22pm

Sorry then for not pointing that out earlier I thought you did but it makes sense that you didn’t find much, as Gunnar just pointed out the correct name for that problem. Anyways, hope this helps!

AlexanderChen · February 6, 2021, 7:02pm

no worries, I was reading de docs there was no mention of steiner algo but in the help?> i see it.

just for refence, this is the graph that i think is complex enough to start with:

and this is the pattern in the matrix of this same graph:

Topic		Replies	Views
Adventure is SQL and Graph Theory (part 1) New to Julia graphs	1	1040	April 20, 2021
Networkx & lightgraphs.jl shortest path benchmark Graphs question , package , lightgraphs	7	4118	May 11, 2020
Interactive display for graphs (graph theory) Graphs question , graphs	7	703	February 23, 2021
Graph.jl : How to interpret the output information returned by the shortest path algorithms? Graphs question , package , lightgraphs , graphs	10	1151	February 17, 2023
Pair of non-intersecting shortest paths on undirected graph Graphs lightgraphs , graphs	6	1349	October 9, 2020

Graph algorithm for Lightgraph

Related topics