Help speed up SimpleGraph creation

mthelm85 · October 21, 2020, 3:12pm

Here’s a MWE of what I’m attempting to do:

using DataFrames
using LightGraphs
using ProgressMeter

df = DataFrame(a = rand(1:50_000, 150_000), b = rand(1:25_000, 150_000)) |> unique!
vs = unique(vcat(df.a, df.b))
n = length(vs)
g = SimpleGraph(n)

p = Progress(n)
Threads.@threads for v in vertices(g)
    connected_vs = filter(row -> row.a == vs[v], df).b
    for cv in connected_vs
        add_edge!(g, v, findfirst(x -> x == cv, vs))
    end
    next!(p)
end

This example will take 10 - 15 minutes to complete on my machine. My real problem is larger and ETA is about 50 minutes. Surely there’s a more performant way??

EDIT: It looks like the filter function is a primary culprit. Changing to this yields much better results:

df = DataFrame(a = rand(1:5_000, 15_000), b = rand(1:2_500, 15_000)) |> unique!
vs = unique(vcat(df.a, df.b))
n = length(vs)
g = SimpleGraph(n)

Threads.@threads for v in vertices(g)
    for cv in df[df.a .== vs[v], :b]
        add_edge!(g, v, findfirst(x -> x == cv, vs))
    end
end

lmiq · October 21, 2020, 3:54pm

Indeed. If the data was ordered relative to the a fields, you could use just a slice instead, and that would be much faster.

mthelm85 · October 21, 2020, 3:59pm

I don’t see a Base.slice, is there a package that exports a slice function?

lmiq · October 21, 2020, 5:54pm

What I meant was that you could use something like

i = 1 
for v in vertices(g)
     j = findlast(x->x==v,df.a)
     if j != nothing
         cv = df.b[i:j]
         i = j + 1
     end
end

This appears to be about ~3 times faster than using cv = df[df.a .== vs[1], :b], but I noticed
that this alternative appears to be fast enough, isn’t it?

(I didn’t test the code to guarantee that it exactly returns the same thing you want).

mthelm85 · October 21, 2020, 5:58pm

I see, thanks! Yes, just changing the find function took care of it.

Topic		Replies	Views
Graph construction performance Graphs lightgraphs	2	723	May 16, 2019
LightGraphs vs. SimpleGraphs General Usage package	10	2744	September 13, 2018
LightGraphs New to Julia	6	1334	November 8, 2019
Allocations when creating graphs, inefficiencies Graphs plotting , lightgraphs , graphs	12	673	July 2, 2020
Suggestions for performance enhancement in LightGraphs `diffusion`? General Usage question , package	8	999	August 22, 2017

Help speed up SimpleGraph creation

Related topics