I have a master graph with 250,000 nodes, and a variable called homes
which is a Vector of Vectors of size 4, with values between 1 and 100,000. The idea is that many vector of size 4 has a corresponding dense graph, which is embeded in the master graph. The master graph can either be a SimpleGraph
or a SimpleWeightedGraph
. In the former case, the code runs super fast in about 0.15 sec. In the latter case (with SimpleWeightedGraph
), the code takes 24 sec. And yet, when there are only a low number of lists of 4 elements, the SimpleWeightGraphs
take only about twice as long as the SimpleGraphs
. My conclusion: add_edge!
for SimpleWeightedGraph
becomes less and less efficient as more and more edges are added. If this is not the case, I have no idea why the code becomes so inefficient. Needless to say, I cannot accept this for my use case.
function ccreateHomeGraph(nb_nodes)
#mgh = SimpleWeightedGraph(nb_nodes)
mgh = SimpleGraph(nb_nodes) # <<<< Should be weighted graph?
homes = rand(1:100000, 100000)
homes = reshape(homes, div(length(homes), 4), 4)
homes = [homes[i,:] for i in 1:size(homes,1)]
@show size(homes)
#homes = [[1,2,3,4,5] for i in 1:100000]
weight = 0.3 # need more generality
for r in 1:length(homes)
person_ids = homes[r]
sz = length(person_ids)
@show r, sz
for i in 1:sz
for j in i+1:sz
#add_edge!(mgh, person_ids[i]+1, person_ids[j]+1, weight)
add_edge!(mgh, person_ids[i]+1, person_ids[j]+1)
end
end
end
return mgh
end
# SimpleGraph about 2x speed of Weighted SimpleGraphs
@time graph = ccreateHomeGraph(250000)
Here is the add_edge!
function that is called. Does it become more and more inefficient the more edges are added? If so, why? This function is found in simplegraphs.jl
:
# add_edge! will overwrite weights.
function add_edge!(g::SimpleWeightedGraph, e::SimpleWeightedGraphEdge)
T = eltype(g)
U = weighttype(g)
s_, d_, w = Tuple(e)
if w == zero(U)
@warn "Note: adding edges with a zero weight to this graph type has no effect." maxlog=1 _id=:swg_add_edge_zero
return false
end
s = T(s_)
d = T(d_)
(s in vertices(g) && d in vertices(g)) || return false
@inbounds g.weights[d, s] = w
@inbounds g.weights[s, d] = w
return true
end
After reading more threads on this subject, I recoded to work with MetaGraphs
, which runs the full problem in about 40 sec, although it took about 3 seconds with SimpleGraphs
. So scaling as a function of the number of edges is improved with MetaGraphs
; however, the time is substantially more expensive than with SimpleGraphs
. Looking at !add_edge()
in MetaGraphs.jl
, I fell there is lots of room for improvement if the library contained optimized versions of the add_edge!
function. For example, in many use cases, mine, in particular, I know that my end nodes are in the vertex set, so there is no need to check. How expensive is the check? Is it an O(1)
check? Retrieving a property from each edge also strikes me as expensive. Just keeping a dictionary{Edge, Property} strikes me as far more efficient. Perhaps that is what MetaGraphs does already?