Creating a Weighted Graph

I would like to try using a weighted graph. To this end, I added SimpleWeightedGraphs to Atom, created an erdos_renyi directed graph, and then applied SimpleWeightedGraphs to this graph.

const G = erdos_renyi(10000, 0.0012, is_directed=true)
WG = SimpleWeightedGraph(G)

I would now like to add weights to selected edges. What is the easiest approach to doing that. I know how to construct the graph from scratch, one edge at a time. But let us say that I wanted the first half of the edges to have weight=2, and the second half to have weight=3. What would be the best approach to accomplish this? Thanks!

There’s no good way to do this when you’re using a generator that’s designed for SimpleGraphs. The best you can do is iterate over the edges of WG and then just re-add the edges that you want with the weights you want. (When you call add_edge!() with an existing edge on a SimpleWeightedGraph with a new weight, the new weight overwrites the old weight.)

Thanks. Delving more into SimpleWeightedGraphs. Looking at

G = erdos_renyi(100, 0.0012, is_directed=true)
WG = SimpleWeightedGraph(G)
weights = WG.weights
adj = adjacency_matrix(WG)

I notice that adj is a copy of the adjacency_matrix, while weights is a reference to the adjacency matrix (which can have values other than unity). My question is: why make a copy of the adjacency_matrix rather than a reference? The user could use the copy command if a copy is required. I do not follow the logic. Shouldn’t this behavior be documented? After all, the objective is to maximize default efficiency.

Looking further, I notice that SimpleWeightedGraph is in a different type hierarchy (with a common ancestor AbstractGraph.). I would have thought that a SimpleWeightedGraph would be a supertype of SimpleGraph. What are your thoughts on this? I do understand that because the SimpleWeightedGraph library was built after the SimpleGraph, that this might be difficult. But perhaps my logic is wrong? I am using your library, Seth as a learning tool.

Thank you.

weights is universal to AbstractGraphs and is intended to be used as an indexable structure to provide the weight of an edge from i to j by accessing weights[i, j]. It’s generally not supposed to be used as an adjacency matrix (DefaultDistance, the weight structure for SimpleGraphs, is NOT the adjacency matrix, but a struct that returns 1 for any index that corresponds to an edge), but for SimpleWeightedGraphs it is a reference to the underlying data structure. It is designed to be read-only.

The answer to your question is because we’ve found that in a lot of cases people want to modify the adjacency matrix (for some calculation) after they get it. If it is a reference, then the underlying graph might be modified as well, leading to potentially undefined behavior (if your new matrix is no longer symmetric, then it cannot be a valid undirected graph, for example).

Julia has no concept of concrete supertypes. A type is either abstract or concrete, and there has been no real need to introduce an intermediate abstract type between AbstractGraph and SimpleGraph.

I appreciate the questions and am happy to answer them. Keep 'em coming.

2 Likes

Thank you, Seth. Your answers make sense. So if I wanted (just for this discussion) a reference to the adjacency matrix, how would I do it?

I also notice that a SimpleGraph is a structure with fields fadjlist, badjlist, ne.
However, the structure of a weighted graph has fields weights, which is a CSC sparse matrix. The fields present in SimpleGraph are not there. I guess directly accessing the fields of a SimpleGraph is poor programming practice. :slight_smile: I still feel that SimpleGraph is a special case of a SimpleWeightedGraph with a weight of 1. At least that is how I would have implemented it. But I think I understand your point about the type hierarchy. So perhaps an AbstractGraph should really be a weighted graph.

How would you implement a MultiLayer model? Multilayers are collection of graphs G_i whose respective nodes are linked together. There is an entire theory on that (and a few Python, at least one, libraries). I just learned about this. I was thinking of implementing an SIR model where people would be modeled on two graphs. On one graph, they are at home, and on the other graph, they are at work. Multilayer graphs would (in principle allow this). More specifically, I would need a Multiplex graph (where the same nodes of two graphs are connected together). I am thinking out loud.

Here is a more serious question:


const G00 = F.erdos_renyi(100000, 0.0006, is_directed=true)
const GW = SimpleWeightedGraph(G00)

I expected GW and G00 to be the same graph, but with G00 converted into a weighted graph with the same topology. However:


> ne(G00)
6002665

> ne(GW)
6000925

The number of edges is different in both graphs. How is that possible?

Although my analysis code will run because a graph is a graph, this is disconcerting.

The answer to my own question is that I should have written:

const GW0 = SimpleWeightedDiGraph(G00)

instead of using an undirected SimpleGraph with a directed graph as an argument. This is an example where my error should have been caught by the package through proper typing. The argument of a directed graph generator should be a directed graph and vice-versa with undirected graphs. A simple conditional statement would work to this effect using the is_directed method.

At least for the JuliaGraphs ecosystem it is. Please don’t access internal fields of any JuliaGraphs structs. We can’t prevent it because there’s no such thing as access control in Julia, but it’s undefined behavior and your code is pretty much guaranteed to break on you at some point without warning if you do this.

I still feel that SimpleGraph is a special case of a SimpleWeightedGraph with a weight of 1. At least that is how I would have implemented it.

From a theoretical perspective, you’re correct. This is what a SimpleGraph is. But from a computational complexity perspective, using an adjacency list to represent a graph (what we do with the fadj and badj fields) provides much better performance in several key use cases over a sparse matrix (which is what SWG uses). For even better performance with unweighted large graphs, check out StaticGraphs.jl. The tradeoff there is that you get amazing cache locality benefit but you lose the ability to modify the graph after creation. It’s all about tradeoffs when it comes right down to it.

I don’t know of a multilayer model package. There was some discussion in Slack a few weeks ago but I wasn’t really participating in it, and unfortunately a clever solution using existing packages escapes me right now so I don’t have any recommendation.

1 Like

Because G00 is a directed graph, which means that an edge from i to j is different than an edge from j to i (and both are counted as distinct edges). GW is an undirected graph, which means that an edge from i to j is the same as its reverse. Therefore, I would expect fewer edges in the undirected version of a directed random graph, as, for example, a directed graph with edges from 1 -> 2 and 2 -> 1 will show up as having two edges, while the corresponding undirected graph will show one edge.

The argument of a directed graph generator should be a directed graph and vice-versa with undirected graphs.

We assume that when you call a constructor for a graph of a specific type you know what you’re doing :slight_smile: There is a use case for creating undirected graphs from directed ones, and we don’t want to foreclose on that.

Edited to add: in the forthcoming LG 2.0, graph generators are now type-dispatched constructors, so for example,

g = erdos_renyi(100_000, 0.0006, is_directed=true)

becomes a constructor:

g = SimpleDiGraph(ErdosRenyi(100_000, 0.0006))

Creating the undirected version is left as an exercise to the reader.

Sorry I missed this question. For a SimpleWeightedGraph, you would go against my earlier recommendation and just access weights(g). But that won’t work for other graph types, and it’s not guaranteed to work for future uses of SimpleWeightedGraph. It’s also exceedingly fragile: if you have an undirected graph and you modify the matrix so that it’s no longer symmetric, you will (probably silently) get errors in your code that will be very difficult to detect. But if you must have a reference, that’s the current best way to do it.

But please don’t.

No worries, Seth. I have programmed good C++ for a long time and I understand encapsulation best practices. But that does not detract from my curiosity. :slight_smile:

1 Like

Again, I’m happy to answer these questions. They’re forcing me to think about design decisions we made several years ago, and make sure they’re still valid. Having new eyes on things, especially now that we’re in the middle of a large redesign of the package, is super helpful. So thanks, and again - keep the questions coming. I’ll do my best to be responsive.

Edited to add

good C++

Never heard of it. :wink:

4 Likes

Here is a question on the weighted graph:

I would like the weight of edge i,j so, for example:

w = edge(i,j)

Alternatively, the weights of all the neighbors of i would be nice:

w = neighbors_weights(i,j)

where w would return an array.

I currently have the following in my code:


for ν in neighbors(G, node_u.index)
findTransSIR(Q, t, τ, w, node_u, nodes[ν], t_max, nodes, expo_τ)
end

But I also need the weights along with the neighbors, so perhaps:

for (w,v) in neighbors(WG, node_u.index)
etc
end

in the case where WG is a weighted graph. What is the easiest way to accomplish this currently?

Thanks.

I would like the weight of edge i,j

You’d like to set it or access it? For the former, add_edge!(g, i, j, w). For the latter, weights(g)[i, j].

the weights of all the neighbors of i would be nice

Naively, this is weights(g)[i, :] (for undirected graphs, and for the outneighbors of directed graphs. For all neighbors (in and out) of directed graphs, you’ll want to union it which gets tricky). Note also that if you’ve removed edges you will have some zeros in there that you’ll need to filter.

But I also need the weights along with the neighbors

These will be the non-zero entries, and their indices, of the above output, which is itself a sparse vector.

Got it! Your answers make sense, and I would eventually have figured it out. I would not have guessed add_edge! since you are not adding an edge. The exclamation point usually means change the first argument, with no convention regarding the remaining arguments. Seems to me that add_edge! could be renamed, or an alias called get_weight(g,i,j,w) could be provided. Of course, I could also create the alias :-). Thanks!

erm, I think there may be some confusion. add_edge!(g, i, j, w) does modify its first argument, g, by adding an edge in graph g from i to j with weight w. For SimpleWeightedGraphs, this is also how you “reweight” an existing edge.

If you wanted just to access the weight of an edge that you know exists between i and j, you use weights(g)[i, j]. There’s no modification done during an access.

I was not very observant. I fixed my program. Adding the weights added 10 seconds to a program that took 18 seconds. That is not really acceptable. I must not be doing something right. Or else, I should be able to access the weights more efficiently. Without providing full source code, here is what I have:

wghts = weights(G)   #### ADDED
	for ν in neighbors(G, node_u.index)
		w = wghts[node_u.index, ν]   #### ADDED
		findTransSIR(Q, t, τ, w, node_u, nodes[ν], t_max, nodes)  #### ADDED w argument
	end
# etc

function findTransSIR(Q, t, Ď„, w, source, target, t_max, nodes)
    # w is the edge weight
	if target.status == :S
		inf_time = t + rand(Exponential(Ď„*w))  #### Added *w
		# Allocate memory for this list
		if inf_time < minimum([source.rec_time, target.pred_inf_time, t_max])
			new_event = Event(target, inf_time, :transmit)
			Q[new_event] = new_event.time
			target.pred_inf_time = inf_time
		end
	end
end

Nothing there looks out of line. indexing into wghts is a O(log n) operation, I believe. I can’t imagine this takes significantly longer than the code without the weights accessor.

Are you adding edges or doing any graph modification as part of your timed routines? Here’s some benchmark code that tests accessing the weights:


julia> g = LightGraphs.SimpleDiGraph(100_000, 5_000_000)
{100000, 5000000} directed simple Int64 graph
       
julia> s = SimpleWeightedDiGraph(g)
{100000, 5000000} directed simple Int64 graph with Float64 weights

julia> for e in edges(s)
           add_edge!(s, src(e), dst(e), rand())
       end

julia> a = [(src(e), dst(e)) for e in edges(s) if (src(e) + dst(e)) % 4 != 0]; # just filter arbitrarily, keeping order

julia> length(a)
3751196

julia> function foo(g, a)
          w = weights(g)
          n = 0.0
          for i in a
             n += w[i[1], i[2]]
          end
          return n
       end
foo (generic function with 1 method)

julia> @benchmark foo(s, a)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     53.333 ms (0.00% GC)
  median time:      53.550 ms (0.00% GC)
  mean time:        53.568 ms (0.00% GC)
  maximum time:     54.033 ms (0.00% GC)
  --------------
  samples:          94
  evals/sample:     1

You can see that sequentially accessing an ordered subset (3.75 million) edges in a SimpleWeightedDigraph with 100k nodes and 5 million edges takes approximately 53 milliseconds on average.

Yes, I see that. Of course, I do not know how many loops I go through, so first I must do more investigation. Something slightly messed up. Will let you know :-).

Here is an update. My long time was because my graph had 30 million edges! Now my graph has 6M edges (12 M bi-directional edges). Running an unweighted graph takes 6.2 sec, while the weighted graph takes 8.9 seconds, an increase of 43%. Much better than before. I do not know whether this is believable or not.

Your timing suggests that 5 million edges (I am rounding off) takes 50 milliseconds, so that is 50*1.e-3/(5e6) = 10^(-8) sec per weight access = 10 ns.

In my code, I compute the weight 120 times per infection and there were approximately 100,000 infections. So that is 1.2e8 weight accesses, which is 1.2 seconds. My code timing increased by 2.2 seconds, so at least we are in the right ballpark. There are other expenses such as an additional function argument, and an additional multiply in an Exponential distribution. So the numbers are now believable. I am sure we can do better, at most a factor of 50% (for the entire code), but this is good enough for now.

3.75 million edges (not 5m) took 53 ms - so it’s about 15ns per access, which gets you ~1.8 seconds, so you’re even more in the ballpark.