I would like to try using a weighted graph. To this end, I added SimpleWeightedGraphs
to Atom, created an erdos_renyi
directed graph, and then applied SimpleWeightedGraphs
to this graph.
const G = erdos_renyi(10000, 0.0012, is_directed=true)
WG = SimpleWeightedGraph(G)
I would now like to add weights to selected edges. What is the easiest approach to doing that. I know how to construct the graph from scratch, one edge at a time. But let us say that I wanted the first half of the edges to have weight=2
, and the second half to have weight=3
. What would be the best approach to accomplish this? Thanks!
There’s no good way to do this when you’re using a generator that’s designed for SimpleGraphs. The best you can do is iterate over the edges of WG
and then just re-add the edges that you want with the weights you want. (When you call add_edge!()
with an existing edge on a SimpleWeightedGraph with a new weight, the new weight overwrites the old weight.)
Thanks. Delving more into SimpleWeightedGraphs. Looking at
G = erdos_renyi(100, 0.0012, is_directed=true)
WG = SimpleWeightedGraph(G)
weights = WG.weights
adj = adjacency_matrix(WG)
I notice that adj
is a copy of the adjacency_matrix
, while weights
is a reference to the adjacency matrix (which can have values other than unity). My question is: why make a copy of the adjacency_matrix rather than a reference? The user could use the copy
command if a copy is required. I do not follow the logic. Shouldn’t this behavior be documented? After all, the objective is to maximize default efficiency.
Looking further, I notice that SimpleWeightedGraph is in a different type hierarchy (with a common ancestor AbstractGraph
.). I would have thought that a SimpleWeightedGraph would be a supertype
of SimpleGraph
. What are your thoughts on this? I do understand that because the SimpleWeightedGraph
library was built after the SimpleGraph
, that this might be difficult. But perhaps my logic is wrong? I am using your library, Seth as a learning tool.
Thank you.
weights
is universal to AbstractGraphs and is intended to be used as an indexable structure to provide the weight of an edge from i
to j
by accessing weights[i, j]
. It’s generally not supposed to be used as an adjacency matrix (DefaultDistance, the weight structure for SimpleGraphs, is NOT the adjacency matrix, but a struct that returns 1
for any index that corresponds to an edge), but for SimpleWeightedGraphs it is a reference to the underlying data structure. It is designed to be read-only.
The answer to your question is because we’ve found that in a lot of cases people want to modify the adjacency matrix (for some calculation) after they get it. If it is a reference, then the underlying graph might be modified as well, leading to potentially undefined behavior (if your new matrix is no longer symmetric, then it cannot be a valid undirected graph, for example).
Julia has no concept of concrete supertypes. A type is either abstract or concrete, and there has been no real need to introduce an intermediate abstract type between AbstractGraph and SimpleGraph.
I appreciate the questions and am happy to answer them. Keep 'em coming.
2 Likes
Thank you, Seth. Your answers make sense. So if I wanted (just for this discussion) a reference to the adjacency matrix, how would I do it?
I also notice that a SimpleGraph
is a structure with fields fadjlist
, badjlist
, ne
.
However, the structure of a weighted graph has fields weights
, which is a CSC sparse matrix. The fields present in SimpleGraph
are not there. I guess directly accessing the fields of a SimpleGraph is poor programming practice. I still feel that SimpleGraph
is a special case of a SimpleWeightedGraph
with a weight of 1. At least that is how I would have implemented it. But I think I understand your point about the type hierarchy. So perhaps an AbstractGraph should really be a weighted graph.
How would you implement a MultiLayer model? Multilayers are collection of graphs G_i whose respective nodes are linked together. There is an entire theory on that (and a few Python, at least one, libraries). I just learned about this. I was thinking of implementing an SIR model where people would be modeled on two graphs. On one graph, they are at home, and on the other graph, they are at work. Multilayer graphs would (in principle allow this). More specifically, I would need a Multiplex graph (where the same nodes of two graphs are connected together). I am thinking out loud.
Here is a more serious question:
const G00 = F.erdos_renyi(100000, 0.0006, is_directed=true)
const GW = SimpleWeightedGraph(G00)
I expected GW
and G00
to be the same graph, but with G00
converted into a weighted graph with the same topology. However:
> ne(G00)
6002665
> ne(GW)
6000925
The number of edges is different in both graphs. How is that possible?
Although my analysis code will run because a graph is a graph, this is disconcerting.
The answer to my own question is that I should have written:
const GW0 = SimpleWeightedDiGraph(G00)
instead of using an undirected SimpleGraph with a directed graph as an argument. This is an example where my error should have been caught by the package through proper typing. The argument of a directed graph generator should be a directed graph and vice-versa with undirected graphs. A simple conditional statement would work to this effect using the is_directed
method.
At least for the JuliaGraphs ecosystem it is. Please don’t access internal fields of any JuliaGraphs structs. We can’t prevent it because there’s no such thing as access control in Julia, but it’s undefined behavior and your code is pretty much guaranteed to break on you at some point without warning if you do this.
I still feel that SimpleGraph
is a special case of a SimpleWeightedGraph
with a weight of 1. At least that is how I would have implemented it.
From a theoretical perspective, you’re correct. This is what a SimpleGraph is. But from a computational complexity perspective, using an adjacency list to represent a graph (what we do with the fadj
and badj
fields) provides much better performance in several key use cases over a sparse matrix (which is what SWG uses). For even better performance with unweighted large graphs, check out StaticGraphs.jl. The tradeoff there is that you get amazing cache locality benefit but you lose the ability to modify the graph after creation. It’s all about tradeoffs when it comes right down to it.
I don’t know of a multilayer model package. There was some discussion in Slack a few weeks ago but I wasn’t really participating in it, and unfortunately a clever solution using existing packages escapes me right now so I don’t have any recommendation.
1 Like
Because G00
is a directed graph, which means that an edge from i
to j
is different than an edge from j
to i
(and both are counted as distinct edges). GW is an undirected graph, which means that an edge from i
to j
is the same as its reverse. Therefore, I would expect fewer edges in the undirected version of a directed random graph, as, for example, a directed graph with edges from 1 -> 2
and 2 -> 1
will show up as having two edges, while the corresponding undirected graph will show one edge.
The argument of a directed graph generator should be a directed graph and vice-versa with undirected graphs.
We assume that when you call a constructor for a graph of a specific type you know what you’re doing There is a use case for creating undirected graphs from directed ones, and we don’t want to foreclose on that.
Edited to add: in the forthcoming LG 2.0, graph generators are now type-dispatched constructors, so for example,
g = erdos_renyi(100_000, 0.0006, is_directed=true)
becomes a constructor:
g = SimpleDiGraph(ErdosRenyi(100_000, 0.0006))
Creating the undirected version is left as an exercise to the reader.
Sorry I missed this question. For a SimpleWeightedGraph, you would go against my earlier recommendation and just access weights(g)
. But that won’t work for other graph types, and it’s not guaranteed to work for future uses of SimpleWeightedGraph. It’s also exceedingly fragile: if you have an undirected graph and you modify the matrix so that it’s no longer symmetric, you will (probably silently) get errors in your code that will be very difficult to detect. But if you must have a reference, that’s the current best way to do it.
But please don’t.
No worries, Seth. I have programmed good C++ for a long time and I understand encapsulation best practices. But that does not detract from my curiosity.
1 Like
Again, I’m happy to answer these questions. They’re forcing me to think about design decisions we made several years ago, and make sure they’re still valid. Having new eyes on things, especially now that we’re in the middle of a large redesign of the package, is super helpful. So thanks, and again - keep the questions coming. I’ll do my best to be responsive.
Edited to add
good C++
Never heard of it.
4 Likes
Here is a question on the weighted graph:
I would like the weight of edge i,j
so, for example:
w = edge(i,j)
Alternatively, the weights of all the neighbors of i
would be nice:
w = neighbors_weights(i,j)
where w
would return an array.
I currently have the following in my code:
for ν in neighbors(G, node_u.index)
findTransSIR(Q, t, τ, w, node_u, nodes[ν], t_max, nodes, expo_τ)
end
But I also need the weights along with the neighbors, so perhaps:
for (w,v) in neighbors(WG, node_u.index)
etc
end
in the case where WG is a weighted graph. What is the easiest way to accomplish this currently?
Thanks.
I would like the weight of edge i,j
You’d like to set it or access it? For the former, add_edge!(g, i, j, w)
. For the latter, weights(g)[i, j]
.
the weights of all the neighbors of i
would be nice
Naively, this is weights(g)[i, :]
(for undirected graphs, and for the outneighbors
of directed graphs. For all neighbors (in and out) of directed graphs, you’ll want to union it which gets tricky). Note also that if you’ve removed edges you will have some zeros in there that you’ll need to filter.
But I also need the weights along with the neighbors
These will be the non-zero entries, and their indices, of the above output, which is itself a sparse vector.
Got it! Your answers make sense, and I would eventually have figured it out. I would not have guessed add_edge!
since you are not adding an edge. The exclamation point usually means change the first argument, with no convention regarding the remaining arguments. Seems to me that add_edge!
could be renamed, or an alias called get_weight(g,i,j,w)
could be provided. Of course, I could also create the alias :-). Thanks!
erm, I think there may be some confusion. add_edge!(g, i, j, w)
does modify its first argument, g
, by adding an edge in graph g
from i
to j
with weight w
. For SimpleWeightedGraphs, this is also how you “reweight” an existing edge.
If you wanted just to access the weight of an edge that you know exists between i
and j
, you use weights(g)[i, j]
. There’s no modification done during an access.
I was not very observant. I fixed my program. Adding the weights added 10 seconds to a program that took 18 seconds. That is not really acceptable. I must not be doing something right. Or else, I should be able to access the weights more efficiently. Without providing full source code, here is what I have:
wghts = weights(G) #### ADDED
for ν in neighbors(G, node_u.index)
w = wghts[node_u.index, ν] #### ADDED
findTransSIR(Q, t, τ, w, node_u, nodes[ν], t_max, nodes) #### ADDED w argument
end
# etc
function findTransSIR(Q, t, Ď„, w, source, target, t_max, nodes)
# w is the edge weight
if target.status == :S
inf_time = t + rand(Exponential(Ď„*w)) #### Added *w
# Allocate memory for this list
if inf_time < minimum([source.rec_time, target.pred_inf_time, t_max])
new_event = Event(target, inf_time, :transmit)
Q[new_event] = new_event.time
target.pred_inf_time = inf_time
end
end
end
Nothing there looks out of line. indexing into wghts
is a O(log n) operation, I believe. I can’t imagine this takes significantly longer than the code without the weights accessor.
Are you adding edges or doing any graph modification as part of your timed routines? Here’s some benchmark code that tests accessing the weights:
julia> g = LightGraphs.SimpleDiGraph(100_000, 5_000_000)
{100000, 5000000} directed simple Int64 graph
julia> s = SimpleWeightedDiGraph(g)
{100000, 5000000} directed simple Int64 graph with Float64 weights
julia> for e in edges(s)
add_edge!(s, src(e), dst(e), rand())
end
julia> a = [(src(e), dst(e)) for e in edges(s) if (src(e) + dst(e)) % 4 != 0]; # just filter arbitrarily, keeping order
julia> length(a)
3751196
julia> function foo(g, a)
w = weights(g)
n = 0.0
for i in a
n += w[i[1], i[2]]
end
return n
end
foo (generic function with 1 method)
julia> @benchmark foo(s, a)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 53.333 ms (0.00% GC)
median time: 53.550 ms (0.00% GC)
mean time: 53.568 ms (0.00% GC)
maximum time: 54.033 ms (0.00% GC)
--------------
samples: 94
evals/sample: 1
You can see that sequentially accessing an ordered subset (3.75 million) edges in a SimpleWeightedDigraph with 100k nodes and 5 million edges takes approximately 53 milliseconds on average.
Yes, I see that. Of course, I do not know how many loops I go through, so first I must do more investigation. Something slightly messed up. Will let you know :-).
Here is an update. My long time was because my graph had 30 million edges! Now my graph has 6M edges (12 M bi-directional edges). Running an unweighted graph takes 6.2 sec, while the weighted graph takes 8.9 seconds, an increase of 43%. Much better than before. I do not know whether this is believable or not.
Your timing suggests that 5 million edges (I am rounding off) takes 50 milliseconds, so that is 50*1.e-3/(5e6) = 10^(-8) sec per weight access = 10 ns.
In my code, I compute the weight 120 times per infection and there were approximately 100,000 infections. So that is 1.2e8 weight accesses, which is 1.2 seconds. My code timing increased by 2.2 seconds, so at least we are in the right ballpark. There are other expenses such as an additional function argument, and an additional multiply in an Exponential distribution. So the numbers are now believable. I am sure we can do better, at most a factor of 50% (for the entire code), but this is good enough for now.
3.75 million edges (not 5m) took 53 ms - so it’s about 15ns per access, which gets you ~1.8 seconds, so you’re even more in the ballpark.