Suggestions for performance enhancement in LightGraphs `diffusion`?

I’m running into performance bottlenecks running a diffusion model I put into LightGraphs.jl (code here and was wondering if anyone might have any suggestions. (p.s. I’m not entirely clear if this belong here or stack overflow – please let me know if I’m in the wrong place!). In my research application, I’m running this on networks of about 10_000_000 nodes thousands of times, so performance really matters.

Profiling the code, I find 70% of run time is generally in Line 57:

   union!(new_infections, randsubseq(outn, local_p))

Which may mean there’s nothing to improve on. But I’m not yet fully comfortable working with @code_warntype, so it strikes me that maybe I’m missing something? It’s relatively clean, but is popping up some read. I think think any of that red matters, but like I said, I’m new to this, and have spent a week trying to find improvements, so would love to know if I just need to parallelize more, or if I’m missing something.

You can get the relevant @code_warntype with:

using LightGraphs
g = erdos_renyi(1_000, 5_000, seed=42)
@code_warntype diffusion(g, 0.6, 100, normalize=true, watch=collect(300:700), initial_infections=[400])

Or read:

julia> @code_warntype diffusion(g, 0.6, 100, normalize=true, watch=collect(300:700),
  watch::AbstractArray{T,1} where T
  initial_infections::AbstractArray{T,1} where T

      NewvarNode(:(watch::AbstractArray{T,1} where T))
      NewvarNode(:(initial_infections::AbstractArray{T,1} where T))
      #temp#@_14::Bool = true
      #temp#@_15::Bool = true
      normalize::Bool = false
      SSAValue(2) = (Base.arraylen)(#temp#@_2::Array{Any,1})::Int64
      SSAValue(3) = (Base.select_value)((Base.sle_int)(0, 1)::Bool, (Base.ashr_int)(SSAValue(2), (Base.bitcast)(UInt64, 1))::Int64, (Base.shl_int)(SSAValue(2), (Base.bitcast)(UInt64, (Base.neg_int)(1)::Int64))::Int64)::Int64
      SSAValue(7) = (Base.select_value)((Base.sle_int)(1, SSAValue(3))::Bool, SSAValue(3), (Base.sub_int)(1, 1)::Int64)::Int64
      #temp#@_10::Int64 = 1
      unless (Base.not_int)((#temp#@_10::Int64 === (Base.add_int)(SSAValue(7), 1)::Int64)::Bool)::Bool goto 39
      SSAValue(8) = #temp#@_10::Int64
      SSAValue(9) = (Base.add_int)(#temp#@_10::Int64, 1)::Int64
      #temp#@_7::Int64 = SSAValue(8)
      #temp#@_10::Int64 = SSAValue(9)
      #temp#@_8::Int64 = (Base.sub_int)((Base.mul_int)(#temp#@_7::Int64, 2)::Int64, 1)::Int64
      #temp#@_9::Any = (Core.arrayref)(#temp#@_2::Array{Any,1}, #temp#@_8::Int64)::Any
      unless (#temp#@_9::Any === :normalize)::Bool goto 21
      normalize::Bool = (Core.typeassert)((Core.arrayref)(#temp#@_2::Array{Any,1}, (Base.add_int)(#temp#@_8::Int64, 1)::Int64)::Any, LightGraphs.Bool)::Bool
      goto 37
      unless (#temp#@_9::Any === :initial_infections)::Bool goto 26
      initial_infections::AbstractArray{T,1} where T = (Core.typeassert)((Core.arrayref)(#temp#@_2::Array{Any,1}, (Base.add_int)(#temp#@_8::Int64, 1)::Int64)::Any, LightGraphs.AbstractVector)::AbstractArray{T,1} where T
      #temp#@_15::Bool = false
      goto 37
      unless (#temp#@_9::Any === :watch)::Bool goto 31
      watch::AbstractArray{T,1} where T = (Core.typeassert)((Core.arrayref)(#temp#@_2::Array{Any,1}, (Base.add_int)(#temp#@_8::Int64, 1)::Int64)::Any, LightGraphs.AbstractVector)::AbstractArray{T,1} where T
      #temp#@_14::Bool = false
      goto 37
      SSAValue(10) = ::LightGraphs.#diffusion
      SSAValue(11) = g::LightGraphs.SimpleGraphs.SimpleGraph{Int64}
      SSAValue(12) = p::Float64
      SSAValue(13) = n::Int64
      (Base.throw)($(Expr(:new, :(Base.MethodError), :((Core.getfield)((Core.getfield)((Core.getfield)(LightGraphs.#diffusion, :name)::TypeName, :mt), :kwsorter)), :((Core.tuple)(#temp#@_2, SSAValue(10), SSAValue(11), SSAValue(12), SSAValue(13))::Tuple{Array{Any,1},LightGraphs.#diffusion,LightGraphs.SimpleGraphs.SimpleGraph{Int64},Float64,Int64}), 0xffffffffffffffff)))::Union{}
      goto 10
      unless #temp#@_14::Bool goto 42
      watch::AbstractArray{T,1} where T = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, 0, 0))
      unless #temp#@_15::Bool goto 50
      $(Expr(:inbounds, false))
      # meta: location /Users/Nick/.julia/v0.6/LightGraphs/src/graphtypes/simplegraphs/SimpleGraphs.jl vertices 44
      SSAValue(6) = (Base.arraylen)((Core.getfield)(g::LightGraphs.SimpleGraphs.SimpleGraph{Int64}, :fadjlist)::Array{Array{Int64,1},1})::Int64
      # meta: pop location
      $(Expr(:inbounds, :pop))
      initial_infections::AbstractArray{T,1} where T = $(Expr(:invoke, MethodInstance for (::LightGraphs.#kw##sample!)(::Array{Any,1}, ::LightGraphs.#sample!, ::MersenneTwister, ::Array{Int64,1}, ::Int64), :($(QuoteNode(LightGraphs.#sample!))), :($(Expr(:invoke, MethodInstance for vector_any(::Any, ::Vararg{Any,N} where N), :(Base.vector_any), :(:exclude), ()))), :(LightGraphs.sample!), :($(Expr(:invoke, MethodInstance for getRNG(::Int64), :(LightGraphs.getRNG), -1))), :($(Expr(:invoke, MethodInstance for vcat(::UnitRange{Int64}), :(Base.vcat), :($(Expr(:new, UnitRange{Int64}, 1, :((Base.select_value)((Base.sle_int)(1, SSAValue(6))::Bool, SSAValue(6), (Base.sub_int)(1, 1)::Int64)::Int64))))))), 1))
      return (LightGraphs.#diffusion#57)(watch::AbstractArray{T,1} where T, initial_infections::AbstractArray{T,1} where T, normalize::Bool, ::LightGraphs.#diffusion, g::LightGraphs.SimpleGraphs.SimpleGraph{Int64}, p::Float64, n::Int64)::Array{Array{Int64,1},1}

Edit: I was wrong, and Julia is doing the right thing already (see @kristoffer.carlsson’s post below). I’ve kept my original post below for completeness.

~~I see one potential issue: the diffusion() function uses keyword arguments with non-concrete types. Keyword arguments are (currently) harder for Julia to generate fast native code for, and I suspect that’s what’s causing the type-instability in the function. Some info on this is here:

Fortunately, it should be pretty easy to improve the situation by using a function barrier. Can you try creating (in lightgraphs locally) a _diffusion() function which takes the same arguments but using only positional arguments, and then change the existing diffusion() to call the positional-argument _diffusion() version? Within _diffusion() Julia will be able to generate specialized code for every variable, and the inner loops should be faster.

If this works, it’s probably worth submitting a PR to LightGraphs.

A function with keyword arguments already contains a function barrier. First, the keyword arguments are sorted in a function, then another function which only contains positional arguments is called.

In other words, usage of keyword arguments in the function is fast but there is some overhead getting there, (also return type might not be inferred).


Oh! Cool, I didn’t know that.

Do you know why the type of watch is an AbstractVector rather than a concrete type in the @code_warntype output? Is the fact that it’s declared as such in the kwargs just a coincidence?

Where does the bottleneck occur in

foo = randsubseq(outn, local_p)
union!(new_infections, foo)


If it’s the union, and it’s because foo is big, you might want to find another way to build a set instead of using the vector that randsubseq returns. If it’s the randsubseq, then there’s probably not a lot you can do about that.

Answering my own question: it appears that it’s the randsubseq. I got some speedup by using the mutating version of it:


randsubseq_buf = zeros(nv(g))


randsubseq!(randsubseq_buf, outn, local_p)
union!(new_infections, randsubseq_buf)

Old performance:

julia> @benchmark diffusion(g, 0.6, 100, normalize=true, watch=collect(300:700), initial_infections=[400])
  memory estimate:  10.16 MiB
  allocs estimate:  146647
  minimum time:     15.954 ms (0.00% GC)
  median time:      19.696 ms (10.20% GC)
  mean time:        19.782 ms (5.81% GC)
  maximum time:     33.713 ms (10.30% GC)
  samples:          253
  evals/sample:     1

New performance:

julia> @benchmark diffusion(g, 0.6, 100, normalize=true, watch=collect(300:700), initial_infections=[400])
  memory estimate:  1.16 MiB
  allocs estimate:  2077
  minimum time:     12.382 ms (0.00% GC)
  median time:      14.393 ms (0.00% GC)
  mean time:        14.728 ms (0.65% GC)
  maximum time:     28.293 ms (0.00% GC)
  samples:          340
  evals/sample:     1

I have not tested for correctness.

Here’s a larger example on a (1m, 5m) E-R graph:

New performance first:

julia> g = erdos_renyi(1_000_000, 5_000_000, seed=42)
{1000000, 5000000} undirected simple Int64 graph

julia> @benchmark diffusion(g, 0.6, 100, normalize=true, watch=collect(300:700), initial_infections=[400])
  memory estimate:  1.04 GiB
  allocs estimate:  4182
  minimum time:     24.961 s (0.56% GC)
  median time:      24.961 s (0.56% GC)
  mean time:        24.961 s (0.56% GC)
  maximum time:     24.961 s (0.56% GC)
  samples:          1
  evals/sample:     1

Original performance:

julia> @benchmark diffusion(g, 0.6, 100, normalize=true, watch=collect(300:700), initial_infections=[400])
  memory estimate:  8.66 GiB
  allocs estimate:  127801473
  minimum time:     31.929 s (19.36% GC)
  median time:      31.929 s (19.36% GC)
  mean time:        31.929 s (19.36% GC)
  maximum time:     31.929 s (19.36% GC)
  samples:          1
  evals/sample:     1

I think that keyword arguments have to be concretely typed to be inferred properly.

@anon94023334 you’re the best, thanks for that! Should have just keep this in LightGraphs. :slight_smile:

The other thing I’ve been playing with is trying to remove already-infected vertices from outn before the randsubseq function, but subsetting seems more costly than leaving them in

not_yet_infected_neighbors = setdiff(outn, infected_vertices)
union!(new_infections, randsubseq(not_yet_infected_neighbors, local_p))

But it slowed down a lot – that setdiff is far more costly then a few more random draws. Similarly if I first convert outn to a set, then do setdiff! then converted back for use in randsubseq (unsurprisingly I guess given the conversions):

outn = Set(outn)
setdiff!(outn, infected_vertices)
union!(new_infections, randsubseq(collect(outn), local_p))

That is the only way I can think of making progress though. I can get through a set of ~1,000 50-step simulations in about 50 minutes, but I’m already over 24 hours for a set of 100-step simulations, so it’s really the later stages of the diffusion process causing problems. Much of that I’m sure is because there are just exponentially more infected vertices in later steps, but it strikes me the one place that there’s “wasted effort” is in re-calculating whether to spread to already-infected nodes. If you have better methods than setdiff to thin that population, please let me know!

Thanks @kristoffer.carlsson, good to know. But seems like those aren’t coming into play in my inner loop where most of my time is being spent, if I’m not mistaken. So probably wouldn’t gain a lot by tightening that?

Since most engagement coming from (the excellent) @anon94023334 just gonna move this to LightGraphs for more domain specific discussion. Thanks all!