Best data structure for mapping between nodes and (known) cliques in a graph, that is easily updated after bisection?

Hi Julia community

I have a question regrading the best data structure for a certain purpose.
I have a graph with of nodes i=1,…,n and edges and from the way I construct the graph I know all the maximum cliques c=1,…,k.
I am looking for a data structure that maps between vertices ans cliques that should have the following features:

  1. Quickly returning all the cliques that contain a node, given that node
  2. Quickly returning all nodes of a given clique,
  3. Quickly execute the following update to the data:
    • For two neighbouring nodes, called parent_1 and parent_2, we dissect the edge by a new node (doesn’t yet affect the data structure but is important to understand the next step)
    • For all cliques that contain both parents: delete that clique and replace it with two new cliques where the new node replaces parent_1 in the first new clique and parent_2 in the second one (all other nodes are as in the old, deleted clique).

The obvious choice for me would be a dataframes, where the rows are the nodes and the columns are the cliques and the entries are one/zero depending whether a node is/isn’t a member of a clique. However, I feel with the Dataframe I create a lot of unnecessary information since it stores a lot of zeros, as cliques are small compared to the whole graph. And I am not sure if tasks 1) and 2) become slows as the graph grows.

I have also played with two Dicts: A Vertex_to_Clique_Dict and a Clique_to_Vertex_Dict, which let me execute task 1) and 2) above quickly, but is very cumbersome when it comes to the update in 3), which is on the other hand easy when working with dataframes.

Is there a natural data structure that lends it self to my needs?
I am thankful for any suggestions.

Edit: Maximum Cliques instead of Maximal Cliques

You’ll need to also tell us, roughly, n, k and the typical sizes of max cliques. Ideally you post code to generate sample data that roughly approximates the distributions of the real data you’re interested in.

I am assuming that you’re using terminology where “maximal clique” means “clique where you cannot add another vertex and it still is a clique”, as opposed to “maximum clique” meaning “a clique of the maximal size possible in the graph”? (every maximum clique is maximal, but the reverse is not true)

I think an appropriate framing of your problem is the following:

  1. You have a bipartite graph between nodes 1:n and max-cliques 1:k, with an edge between a node and a max-clique iff the node is part of the max-clique.
  2. This datastructure allows you to reconstruct the original graph: Two nodes are adjacent if their distance in the bipartite graph is 2, i.e. iff there is a clique that contains both
  3. You can easily reconstruct all nodes contained in a clique.
  4. You can easily get all maximal cliques containing a node. From that you can easily get all cliques containing that node (i.e. subsets of the maximal cliques), but that may well be an exponential number!

The approach of having two dicts looks like it makes a lot of sense. I don’t see the problem? You should post the dataframe and dict-based approaches here.

You should also consider bit-packing the entire thing, but that very much depends on the sizes/sparsity you’re working on.

1 Like

Thank you

First sorry for the confusion I really ment “Maximum Cliques”, I edited that part in the question.

Second, the graph is generated quite simply: We start with a complete graph of size N, then we ad nodes according to the procedure outlined in 3), which is basically a bisection. Thus after every step the size of the graph increases by one, but there might be multiple new Cliques, all of size N. In my application the many bissection steps are taken so that the size of the graph quickly becomes much bigger than N.

The problem with maintaining two Dicts, is that updating both in accordance with 3) seems to be quite tedious. I haven’t found an efficient solution.

I will try and implement your suggested solution using bipartite graphs. This seems very efficient since the update only involves changing some edges. Thank you for the suggestion!

Then, in your graph, maximum cliques are the same as maximal cliques :slight_smile:

Tedious, as in “not a 1-liner”, is not the same as “efficient”, as in “runs quickly”.

To give an example:

struct VertexAndCliques{T <: AbstractSet{Int}}
  cliques::Vector{T} # cliques[i] is the set of clique-ids that contain vertex i
  vertices::Vector{T} # vertices[i] is the set of vertices in clique i
  scratch_space::T # a way to re-use allocations. 
end

#returns -1 on error, otherwise returns the number of new cliques
function split!(data::VertexAndCliques, i, j)
  i != j || i > length(data.cliques) || j > length(data.cliques)  || return -1
  
  intersect!(data.scratch_space, data.cliques[i], data.cliques[j])
  shared_cliques_frozen = data.scratch_space
  
  isempty(shared_cliques_frozen) && return -1
  
  shared_cliques = copy(shared_cliques_frozen)
  
  #register the new node
  push!(data.cliques, shared_cliques)
  newnode = length(vertices)

  for cid in shared_cliques_frozen
    vs = data.vertices[cid]
    vs2 = copy(vs)
    push!(data.vertices, vs2)
    cid2 = length(data.vertices)

    #need to replace j->newnode in vs, i->newnode in vs2
    pop!(vs, j); push!(vs, newnode);
    pop!(vs2, i); push!(vs2, newnode);
  
   #we doubled the clique, and corrected their membership list.
   #Next need fix up the list of cliques for each vertex.
   for vid in vs
      vid != i && push!(data.cliques[vid], cid2)
    end
  end
  return length(shared_cliques_frozen)
end

You can benchmark that with hash-table Set{Int} and with BitSet, which I guess will be much faster.

That being said, you can probably get a lot of mileage out of fixing N, the size of all your cliques, as a type parameter, and encoding a clique as NTuple{N,Int16}, i.e.

struct VertexAndCliques{T <: AbstractSet{Int}, N}
  cliques::Vector{T} # cliques[i] is the set of clique-ids that contain vertex i
  vertices::Vector{NTuple{N, Int16}} # vertices[i] is the set of vertices in clique i
  scratch_space::T # a way to re-use allocations. 
end

That’s because your bipartite graph has the property that the degree of each clique is the same (N), and you will never be able to represent a structure with more than 2^15 vertices anyways (you can probably get away with UInt8).

1 Like

Thank you for supplying this solution!

What I meant is that my current solution is not a one liner but is very inefficient, as in: it creates a bottle neck in my application.

I will try your solution and suggestnions and see if it is faster. Will take me a while thought to integrate it into the rest of my code.
In any case: thank you!