below is a graphical representation of 3 SQL Tables with their corresponding columns. And their intertable connections.

In red circles are four columns of two different tables that are not directly connected with each other.

I am looking for an algorithm that takes a set of vertices as input (red cicles) and then gives me the shortest path possible that links the input vertices. In this case:

I realize that especially the first 2 and the last 2 are not perse what you would expect because they branch inward (or outward) but any algo that comes reasonably close to this is already a great help!

I donât think there are any specialized algorithms for multiple start / end points. Even if there were AFAIK the time complexity of the best all-pairs shortest path algorithm isnât a whole lot better than the naĂŻve alg of running SP on all C(|V|, 2) start- end-vertex pairings â maybe a log factor in the difference.

How large (relative to the vertex set) do you expect your circled vertices to be? If itâs a small fraction then you would probably be as well off to go with the naĂŻve strategy above.

would it help if I take advantage of the structure itself. e.g. instead of input of 4 columns I only give T3 as start and T1 as end (every column is connected to a Table)? This will of course give me only part of the story but at least something to start out with.

Where can I find information on designing an algorithm for graphs specifically for julia or in python?

It would help to know a little more about the structure of your graph in the worst case.

is it always a tree or can there be multiple connections between Ts

are your input vertices always leaves (have only one connection) or could you, if we take your example, start with something like ( C2.1, C1.5 ) as input?

are there weights associated with anything or is it just (connection or no connection)

is it always a tree or can there be multiple connections between Ts â The table/column structure will always be a star_graph meaning 1 point in de middle which is the table and n columns (n>1). But it could be that 1 table has connections with multiple other tables but it must at least have 1 connection otherwise its disjoined.

are your input vertices always leaves (have only one connection) or could you, if we take your example, start with something like ( C2.1, C1.5 ) as input? â Good question, yes it should be possible to use a leave that is simulatenously also a âbridgeâ between its own table and another.

are there weights associated with anything or is it just (connection or no connection)â No, there are no weights and the edges a bidirectional.A weigh could be added but they would all be the same weight since there is no distinction (cost difference) between them

I hope i understood your question correctly please tell me if i missed the mark!

The general wikipage of âsteiner treesâ
did not give me enough to make the connection between my current problem and that concept. Could you explain the connection to me?

The minimum Steiner tree problem in graphs means to find the smallest tree that connects a given set of terminal nodes, which seemed to be what you wanted to find.

Yeah What I meant was rather if circles are possible, like T1>C1.2>C2.1>T2>C2.2>C3.1>T3>C3.2>C1.1>T1

Gunnar did point you into the right direction. This also tells you, that this problem is NP-complete in the general case. (if your case is general or sufficiently structured is still under question)

Nontheless, this means nothing but that you donât want to solve it exactly for big, complex problems. You should be fine with any of the approximaton algorithms for that problem.

So, anyways, what you want to do is (probably) the following (disclaimer: Iâm not a graph theorist):

reduce your problem size as much as possible:

you can safely drop all vertices with 1 edge that are not your âstarting pointsâ, they can never minimally connect other vertices and thus can not appear in the solution

think about further reductions: is every connection between tables of the form T1>C1.x>C2.y>T2 without any side branches? Or can something like this exist:

And what shall the cost of âtraversingâ this be? Could be 3, but it may be just as cheap having 3 of the interlinked tables as having 2 or 4. (This depends on your definition, is every edge/link costly or do you just want to minimize the amount of inter-table connections?)
Depending on that you may find more powerful reductions, e.g. it may be possible to contract each inter-table connection to a single vertex, or a single edge! If you can reduce your problem size sufficiently, the choice of algorithm is probably near-irrelevant, itâs rare that youâll link 100+ SQL tables after all, so your worst case problem may still be trivially small with a âcostlyâ algorithm.

Depending on the topology that is allowed (will those diagramms contain circles or not, thatâs really the point), you eitehr pick one of the algorithms from the wikipedia page or a different one from that field and apply it liberally.
Or - if these are always tree-like (cycle free) - a simple dfs will give you an optimal solution in acceptable time.

@FPGro & @GunnarFarneback Thank you both for the good feedback just a couple of answers to open questions concering FPGro.

Yeah What I meant was rather if circles are possible, like T1>C1.2>C2.1>T2>C2.2>C3.1>T3>C3.2>C1.1>T1

Gunnar did point you into the right direction. This also tells you, that this problem is NP-complete in the general case. (if your case is general or sufficiently structured is still under question)

yes a circle path is technically possible.

reduce your problem size as much as possible:

you can safely drop all vertices with 1 edge that are not your âstarting pointsâ, they can never minimally connect other vertices and thus can not appear in the solution

think about further reductions: is every connection between tables of the form T1>C1.x>C2.y>T2 without any side branches? Or can something like this exist:

yes, I was thinking along similar lines. Once you know the input nodes that need to connect one could construct a temp graph that only features the tables vertices, the input vertices (both starting and finishing) and the âjoin verticesâ. I was thinking that I could go one step further and take out temporarily the input vertices and exchange them for the tables on which the input vertices are connected. This way I only have the âtablesâ and âjoinâ vertices.

Yes, it is implied that if the first picture is possible then the second picture is possible. But until that edge is made by the âgraphistaâ for a lack of a better term, it is not possible to travel that edge. So there is no knowledge graph kind of reasoning. Its pretty plain in that respect.

And what shall the cost of âtraversingâ this be? Could be 3, but it may be just as cheap having 3 of the interlinked tables as having 2 or 4. (This depends on your definition, is every edge/link costly or do you just want to minimize the amount of inter-table connections?)

I want to keep the inter-table connections to a minimum so I will have to start thinking about a weighted graph. Not sure which weight should be given.I have to visualize that better to understand it.

The approach above is want I am currently pursuing. At the moment I am experimenting how to best transform the input of columns into a list of their tables. So if I give it C2.1 it gives me T2 back. I wanted to see if I can find a pattern in the Matrix() that I can transform into an algorithm. Because of the star_graph structure there is a clear pattern but I am not convinced i can find an algo that will always work. So I am thinking about using Metagraphs but I need to be able to âqueryâ on Metagraph levl, I will need to know more of the package to understand if that is feasible.

I also have concluded that I need weights on my edges but that is not a big challenge.

Sounds good. I just searched out of curiosity, and there appears to be an approximate minimal steiner tree algorithm in the LightGraphs package already so if you can transform your graph in a suitable way and determine the weights properly, you can probably use that. Good luck!

Sorry then for not pointing that out earlier I thought you did but it makes sense that you didnât find much, as Gunnar just pointed out the correct name for that problem. Anyways, hope this helps!