# Help speed up SimpleGraph creation

Here’s a MWE of what I’m attempting to do:

``````using DataFrames
using LightGraphs
using ProgressMeter

df = DataFrame(a = rand(1:50_000, 150_000), b = rand(1:25_000, 150_000)) |> unique!
vs = unique(vcat(df.a, df.b))
n = length(vs)
g = SimpleGraph(n)

p = Progress(n)
connected_vs = filter(row -> row.a == vs[v], df).b
for cv in connected_vs
add_edge!(g, v, findfirst(x -> x == cv, vs))
end
next!(p)
end
``````

This example will take 10 - 15 minutes to complete on my machine. My real problem is larger and ETA is about 50 minutes. Surely there’s a more performant way??

EDIT: It looks like the `filter` function is a primary culprit. Changing to this yields much better results:

``````df = DataFrame(a = rand(1:5_000, 15_000), b = rand(1:2_500, 15_000)) |> unique!
vs = unique(vcat(df.a, df.b))
n = length(vs)
g = SimpleGraph(n)

for cv in df[df.a .== vs[v], :b]
add_edge!(g, v, findfirst(x -> x == cv, vs))
end
end
``````

Indeed. If the data was ordered relative to the `a` fields, you could use just a slice instead, and that would be much faster.

1 Like

I don’t see a `Base.slice`, is there a package that exports a `slice` function?

What I meant was that you could use something like

``````i = 1
for v in vertices(g)
j = findlast(x->x==v,df.a)
if j != nothing
cv = df.b[i:j]
i = j + 1
end
end

``````

This appears to be about ~3 times faster than using `cv = df[df.a .== vs[1], :b]`, but I noticed
that this alternative appears to be fast enough, isn’t it?

(I didn’t test the code to guarantee that it exactly returns the same thing you want).

1 Like

I see, thanks! Yes, just changing the `find` function took care of it.