Looping over edges in LightGraphs

erlebach · June 7, 2020, 3:54am

I have created a graph with 100,000 vertices and 30 edges per vertex. I would like to loop over the edges to prestore the edge extremities. Thus:

# Prestore all the edge nodes
nodes = zeros(Int,2,nb_edges)
# 2 second for loop over 3,000,000 edges. Why? 
@time for (i,e) in enumerate(edges(graph))
    nodes[1, i] = src(e)
    nodes[2, i] = dst(e)
end

The loop takes 2 seconds on a single thread on a Macbook pro. I feel that tit should be possible to loop much faster. This loop allocates 1.2 GBytes. Note that nodes is preallocated, so I am not sure why there are this many allocations. This corresponds to an average allocation of 400 bytes per edge, which seems excessive. Aren’t the edges prestored somewhere?

Thanks…

Oscar_Smith · June 7, 2020, 4:04am

Are you benchmarking in global scope? If so, that’s your problem.

erlebach · June 7, 2020, 4:26am

You are correct! I created a function, and the time reduces to 0.34 seconds, give or take. Note though, that I can get the adjacency matrix in 0.04 seconds. I should be able to extract the edges from this matrix in much faster time than my loop. My own loop allocates 182 Mbytes. The loop that computes the adjacency matrix allocates 78 Mbytes! I should be able to get what I need from the adjacency matrix with essentially zero allocation at 10x faster.

function tst(graph)
    @time for (i,e) in enumerate(edges(graph))
        nodes[1, i] = src(e)
        nodes[2, i] = dst(e)
    end
    @time adj = adjacency_matrix(graph)
end

tst(graph)

When will I get used to this? If graph were constant, I’d get even faster speed.

under-Peter · June 7, 2020, 7:16am

Looks like your node object is still a global. Either make it const (if global) or give it to the function as an argument.

In the latter case it’s custom to put nodes as the first argument and add an exclamation mark to the function to indicate that you’re changing the first argument, i.e. tst!(nodes, graph).

erlebach · June 7, 2020, 11:42am

@under-Peter: thank you! Here is my test function and new timings, that beats adjacency_matrix by a fact of two, which is what I would expect.

I have one question though: the edge structure has two fields: src and dst. So why aren’t the two node numbers referred to as e.src and e.dst, rather than src(e) and dst(e)? Is it only a matter of encapsulation? I tried both approaches and the timings did not change, suggesting inlining as opposed to function call.

function tst()
    nb_nodes = 100000;
    edges_per_vertex = 30
    graph = random_regular_digraph(nb_nodes, edges_per_vertex);
    nb_edges = ne(graph)
    nb_nodes = nv(graph)
    nodes = zeros(Int,2,nb_edges)

    println("\n")
    @time for (i,e) in enumerate(edges(graph))
        nodes[1, i] = src(e)
        nodes[2, i] = dst(e)
    end
    println("Loop over nodes completed\n")
    @time adj = adjacency_matrix(graph)
    println("adjacency_matrix completed\n")
end

  0.020194 seconds
Loop over nodes completed

  0.049738 seconds (100.01 k allocations: 77.986 MiB)
adjacency_matrix completed

anon94023334 · June 7, 2020, 3:20pm

Well, it depends on what you need. edges gives you an iterator over the edges of a graph. adjacency_matrix gives you a sparse matrix of varying type where the nonzero entries denote an edge.

I should be able to get what I need from the adjacency matrix with essentially zero allocation at 10x faster.

Why do you think that?

Your allocations are not happening with edges:


julia> g = Graph(1_000_000, 3_000_000)  # 10x your sample graph
{1000000, 3000000} undirected simple Int64 graph

julia> function count(g)
           i = 0
           for e in edges(g)
               i += (src(e) + dst(e))
           end
           return i
       end
count (generic function with 1 method)

julia> @benchmark count(g)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     71.168 ms (0.00% GC)
  median time:      71.743 ms (0.00% GC)
  mean time:        72.377 ms (0.00% GC)
  maximum time:     84.359 ms (0.00% GC)
  --------------
  samples:          70
  evals/sample:     1

anon94023334 · June 7, 2020, 3:22pm

Because in LightGraphs we overcome this issue with “all fields of structs are public in Julia” by creating accessors to be the official API. src(e) inlines to e.src in this case, but it’s not always guaranteed to do so. This allows us to change the datastructure behind edges without affecting the API. (See, e.g., g.fadj, which you should never use under any circumstances.)

Also, I note you’re timing construction of the 3m x 2 matrix in your benchmark test, which on my system is 0.2 seconds (as opposed to 71ms for iterating over the edges).

erlebach · June 7, 2020, 3:31pm

Thanks. Given that inlining is not guaranteed, performance is therefore not guaranteed either.

anon94023334 · June 7, 2020, 3:33pm

The minute it becomes a problem it’s a bug in the compiler. Accessing a field of a static structure should always be inlined.

Topic		Replies	Views
Graph construction performance Graphs lightgraphs	2	714	May 16, 2019
Dynamic graphs New to Julia	3	777	July 15, 2020
Looping over rows and columns: what am I doing wrong? Performance	4	1088	February 17, 2020
ANN: LightGraphs v0.8.0 Community	1	767	April 10, 2017
Strange timing issue with code General Usage	5	709	August 7, 2017

Looping over edges in LightGraphs

Related topics