Lightgraph + ModelToolkit optimization

You might want to get into the weeds a bit: from the REPL, do a @profile myfunc(...) and then after it’s run, do a Profile.print(). This will give you a hierarchical call tree with the number of samples for each element in the tree (the larger the number, the larger the influence of that block of code on the overall runtime).

Will do. Thanks. I did not know about Profile.print()

One question: can this be done from within a function?

I don’t know. I’ve always done it from the REPL. It can produce a lot of output and in some cases will truncate unless you call @profile with some additional parameters.

I printed a flat listing, ordered by count (i.e., nb of snapshots). I am only listing the most important. I cannot tell much from this. I would love a sort by time spent in a function, as opposed to by line. Anybody know how to do this?

   223         1 @Base/compiler/ssair/inlining.jl                             978 process_simple!(::Core.Compiler.IRCode, ::Int64, ::Core.Compiler.Params)
   229         0 @Base/boot.jl                                                424 Array
   238         0 @Base/boot.jl                                                414 Array
   267         0 @Base/compiler/ssair/ir.jl                                  1151 iterate(::Core.Compiler.IncrementalCompact, ::Tuple{Int64,Int64})
   268         1 @Base/compiler/ssair/legacy.jl                                10 inflate_ir(::Core.CodeInfo, ::Core.MethodInstance)
   269         2 @Base/compiler/ssair/ir.jl                                   984 process_node!
   292         0 @Base/compiler/abstractinterpretation.jl                     872 abstract_call_known(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.Infere...
   332         0 @Base/compiler/ssair/ir.jl                                  1296 compact!
   360         0 @Base/compiler/ssair/inlining.jl                             549 batch_inline!(::Array{Any,1}, ::Core.Compiler.IRCode, ::Array{Core.LineInfoNode,1}, ::Bool)
   369         2 @Base/compiler/ssair/inlining.jl                             729 analyze_method!(::Int64, ::Core.Compiler.Signature, ::Any, ::Core.SimpleVector, ::Method, ::Expr,...
   396         2 @Base/compiler/ssair/inlining.jl                             995 assemble_inline_todo!(::Core.Compiler.IRCode, ::Core.Compiler.OptimizationState)
   441       333 @Base/compiler/typeutils.jl                                   46 argtypes_to_type
   508         1 @Base/compiler/ssair/driver.jl                               121 run_passes(::Core.CodeInfo, ::Int64, ::Core.Compiler.OptimizationState)
   544         1 @Base/compiler/ssair/driver.jl                               107 just_construct_ssa(::Core.CodeInfo, ::Array{Any,1}, ::Int64, ::Core.Compiler.OptimizationState)
   545         4 @Base/compiler/ssair/inlining.jl                              74 ssa_inlining_pass!
   654       654 @Base/boot.jl                                                405 Array
   842         2 @Base/compiler/ssair/driver.jl                               112 run_passes(::Core.CodeInfo, ::Int64, ::Core.Compiler.OptimizationState)
   884        18 @Base/compiler/ssair/inlining.jl                            1033 assemble_inline_todo!(::Core.Compiler.IRCode, ::Core.Compiler.OptimizationState)
  1223         0 @Base/compiler/ssair/inlining.jl                            1011 assemble_inline_todo!(::Core.Compiler.IRCode, ::Core.Compiler.OptimizationState)
  1378         0 @Base/compiler/abstractinterpretation.jl                      50 abstract_call_gf_by_type(::Any, ::Array{Any,1}, ::Any, ::Core.Compiler.InferenceState, ::Int64)
  1688         0 @Base/compiler/abstractinterpretation.jl                     604 abstract_apply(::Any, ::Any, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.InferenceState, ::In...
  1697         0 @Base/compiler/abstractinterpretation.jl                     673 abstract_call_known(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.Infere...
  2466         0 @Base/compiler/abstractinterpretation.jl                     893 abstract_call(::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.InferenceState, ::I...
  2601      2601 @Base/reflection.jl                                          841 _methods_by_ftype
  2911         9 @Base/compiler/ssair/inlining.jl                              71 ssa_inlining_pass!
  3254         0 @Base/compiler/tfuncs.jl                                    1444 return_type_tfunc(::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.InferenceState)
  3255         0 @Base/compiler/abstractinterpretation.jl                     820 abstract_call_known(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.Infere...
  3456         0 @Base/compiler/ssair/driver.jl                               116 run_passes(::Core.CodeInfo, ::Int64, ::Core.Compiler.OptimizationState)
  3548         0 @Base/compiler/abstractinterpretation.jl                     251 abstract_call_method_with_const_args(::Any, ::Any, ::Array{Any,1}, ::Core.SimpleVector, ::Core.Co...
  3689         2 @Base/compiler/abstractinterpretation.jl                     124 abstract_call_gf_by_type(::Any, ::Array{Any,1}, ::Any, ::Core.Compiler.InferenceState, ::Int64)
  5414         0 @Base/compiler/optimize.jl                                   169 optimize(::Core.Compiler.OptimizationState, ::Any)
  5599         0 @Base/compiler/typeinfer.jl                                   33 typeinf(::Core.Compiler.InferenceState)
  5709         4 @Base/compiler/abstractinterpretation.jl                    1213 typeinf_local(::Core.Compiler.InferenceState)
  6789         0 @Base/compiler/typeinfer.jl                                  488 typeinf_edge(::Method, ::Any, ::Core.SimpleVector, ::Core.Compiler.InferenceState)
  6848         3 @Base/compiler/abstractinterpretation.jl                     404 abstract_call_method(::Method, ::Any, ::Core.SimpleVector, ::Bool, ::Core.Compiler.InferenceState)
  6882         0 @Base/compiler/abstractinterpretation.jl                     101 abstract_call_gf_by_type(::Any, ::Array{Any,1}, ::Any, ::Core.Compiler.InferenceState, ::Int64)
  9086         2 @Base/compiler/abstractinterpretation.jl                    1227 typeinf_local(::Core.Compiler.InferenceState)
  9177         9 @Base/compiler/abstractinterpretation.jl                     873 abstract_call_known(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.Infere...
  9205         3 @Base/compiler/abstractinterpretation.jl                     895 abstract_call(::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.InferenceState, ::I...
  9205         0 @Base/compiler/abstractinterpretation.jl                     974 abstract_eval(::Any, ::Array{Any,1}, ::Core.Compiler.InferenceState)
  9206         2 @Base/compiler/abstractinterpretation.jl                     880 abstract_call(::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Compiler.InferenceState)
  9215         0 @Base/compiler/typeinfer.jl                                  574 typeinf_ext(::Core.MethodInstance, ::Core.Compiler.Params)
  9216         3 @Base/compiler/abstractinterpretation.jl                    1283 typeinf_nocycle(::Core.Compiler.InferenceState)
  9216         3 @Base/compiler/typeinfer.jl                                   12 typeinf(::Core.Compiler.InferenceState)
  9227         0 @Base/compiler/typeinfer.jl                                  605 typeinf_ext(::Core.MethodInstance, ::UInt64)
  9369       142 @Base/loading.jl                                            1080 include_string(::Module, ::String, ::String)
  9369         0 @CodeTools/src/eval.jl                                        30 include_string(::Module, ::String, ::String, ::Int64)
  9369         0 @Atom/src/eval.jl                                            119 (::Atom.var"#206#211"{String,Int64,String,Bool})()
  9369         0 @CodeTools/src/utils.jl                                       30 withpath(::Atom.var"#206#211"{String,Int64,String,Bool}, ::String)
  9369         0 @Atom/src/eval.jl                                              9 withpath(::Function, ::String)
  9369         0 @Atom/src/eval.jl                                            118 #205
  9369         0 @Base/logging.jl                                             396 with_logstate(::Atom.var"#205#210"{String,Int64,String,Bool}, ::Base.CoreLogging.LogState)
  9369         0 @Base/logging.jl                                             503 with_logger
  9369         0 @Atom/src/eval.jl                                            117 #204
  9369         0 @Atom/src/repl.jl                                            127 hideprompt(::Atom.var"#204#209"{String,Int64,String,Bool})
  9369         0 @Atom/src/eval.jl                                            116 macro expansion
  9369         0 /Users/erlebach/.julia/packages/Media/ItEPc/src/dynamic.jl    24 macro expansion
  9369         0 @Atom/src/eval.jl                                            113 eval(::String, ::Int64, ::String, ::String, ::Bool)
  9369         0 @Base/essentials.jl                                          712 invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},...
  9369         0 @Base/essentials.jl                                          711 invokelatest(::Any, ::Any, ::Vararg{Any,N} where N)
  9369         0 @Atom/src/eval.jl                                             41 macro expansion
  9369         0 @Base/task.jl                                                358 (::Atom.var"#188#189")()
Total snapshots: 39039

I don’t know how to interpret that, and I may have taken you down a distracting path. The way I usually do it is use the nested (hierarchical) output to determine the heavy-hitting functions, and then drill down into the specific lines within those functions that have the most samples, and see what I can do to optimize them.

Again, this may have been a distraction. (But I don’t see any of your code in there. I do see a lot of Atom stuff. I’d recommend doing this in a standalone REPL.)

Ok. I unerstand the point about Atom and standalone Repl. I also analyzed just JumpProblem, but I do not see anywhere where the JumpProblem function is called. Perhaps the sampling missed it? Hard to believe. The profiling tools are still quite primitive. I might have to stop and simply investigate a jump implementation that does not use ReactionSystem. I am convinced that it is its datastructure that makes it difficult to achieve good performance. Rather than using arrays of floats to store the reaction equations, they are stored one complex datastructure at a time, which is inefficient to say the least if all the equations have the same structure, as is the case here. More precisely, all S equations have the same structure, as do all I equations, as do all R equations. Therefore, only three data structures should be necessary.

In my experience LLVM has a problem with functions withs lots of instructions (as probably here since it is constructed with ModelingToolkit) and the time spent in LLVM does definitely grow superlinearly with the number of instructions which is quite frustrating.

That would be compile times, which isn’t measured here IIUC. The building of the function shouldn’t take this long. I think we’re doing something wrong.

I believe the problem is the complexity of the Reaction data structures that exist for each equation separately. You are probably getting cache thrashing. The equations should be divided into groups of identical structure (i.e., data access and memory usage) and each group should be treated sequentially. Perhaps this is what happens, but the output to ReactionSystem does not suggest that.

Hi @erlebach, would you mind running this and possibly profiling with
`substituter` to avoid Dict re-construction by shashi · Pull Request #432 · SciML/ModelingToolkit.jl · GitHub ?

Eventually it should get released.

Ok, so I am implementing Gillespie with the help of LightGraphs.jl . Here is my code, which is rather short. The graph has 2000 nodes. Gillespie.jl crashes. I had such high hopes. Perhaps Simon Frost can comment. All his examples have only one node, so are simple, and do not tax his library. That seems to be the case in many of the examples.

The code:

using Plots
using DataFrames
#using DiffEqBase
#using DiffEqJump
using LightGraphs
using Gillespie
#using DiffEqBae.EnsembleAnalysis
using Profile
import Random: seed!
#using PProf

const nvertss = 2000

#const dgs = DiGraph(nvertss, nedgess);
# Degree of each node: 2nd argument  5
const dgr = random_regular_digraph(nvertss, 3)

# create a system with all S, I, R equations
function F_sir(x,parms)
    (S,I,R) = x
	S = @view x[0*nvertss+1:1*nvertss]
	I = @view x[1*nvertss+1:2*nvertss]
	R = @view x[2*nvertss+1:3*nvertss]
    (beta,gamma) = parms
	infections = [beta*S[src(e)]*I[dst(e)]
			for e ∈ edges(dgr)]
	recoveries = [gamma*I[v]
			for v ∈ vertices(dgr)]
    vcat(infections, recoveries)
end

nu1a= repeat([-1 1 0], nvertss)
nu1b = repeat([0 -1 1], nvertss)
nu = vcat(nu1a, nu1b)
tf = 200.0
nverts = nvertss

S0 = ones(Int, nverts)
I0 = zeros(Int, nverts)
R0 = zeros(Int, nverts)

function infect(i)
    S0[i] = 0; # One person is infected
    I0[i] = 1;
    R0[i] = 1 - S0[1] - I0[1]
end

#infect.([1,10,15,25])
infect.([3]) #,10,15,25])
vars0 = vcat(S0, I0, R0);
seed!(1234)

parms = [0.1, 0.1]
result = ssa(vars0, F_sir, nu,parms, tf)

and the error:

signal (11): Segmentation fault: 11
in expression starting at /Users/erlebach/covid_modeling_julia/julia_code/sir-julia/notebook/graph_based/graph_gillepsie.jl:57
getindex at ./array.jl:787 [inlined]
getindex at ./subarray.jl:263 [inlined]
gillespie at /Users/erlebach/.julia/packages/Gillespie/UfDtG/src/SSA.jl:111
unknown function (ip: 0x1341fd242)
#ssa#5 at /Users/erlebach/.julia/packages/Gillespie/UfDtG/src/Gillespie.jl:43
ssa at /Users/erlebach/.julia/packages/Gillespie/UfDtG/src/Gillespie.jl:39
unknown function (ip: 0x1341fc0a7)
jl_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_call at /Users/sabae/buildbot/worker/package_macos64/build/src/interpreter.c:369
eval_body at /Users/sabae/buildbot/worker/package_macos64/build/src/interpreter.c:0
jl_interpret_toplevel_thunk at /Users/sabae/buildbot/worker/package_macos64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /Users/sabae/buildbot/worker/package_macos64/build/src/toplevel.c:814
jl_parse_eval_all at /Users/sabae/buildbot/worker/package_macos64/build/src/ast.c:872
include_string at ./loading.jl:1080
include_string at /Users/erlebach/.julia/packages/CodeTools/kosGY/src/eval.jl:30
unknown function (ip: 0x11502e062)
#206 at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:119
withpath at /Users/erlebach/.julia/packages/CodeTools/kosGY/src/utils.jl:30
withpath at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:9
#205 at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:118 [inlined]
with_logstate at ./logging.jl:396
with_logger at ./logging.jl:503 [inlined]
#204 at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:117 [inlined]
hideprompt at /Users/erlebach/.julia/packages/Atom/LBufP/src/repl.jl:127
macro expansion at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:116 [inlined]
macro expansion at /Users/erlebach/.julia/packages/Media/ItEPc/src/dynamic.jl:24 [inlined]
eval at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:113
unknown function (ip: 0x13419f54a)
jl_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/builtins.c:643
jl_f__apply at /Users/sabae/buildbot/worker/package_macos64/build/src/builtins.c:657 [inlined]
jl_f__apply_latest at /Users/sabae/buildbot/worker/package_macos64/build/src/builtins.c:693
#invokelatest#1 at ./essentials.jl:712
jl_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/builtins.c:643
invokelatest at ./essentials.jl:711
jl_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
do_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/builtins.c:643
macro expansion at /Users/erlebach/.julia/packages/Atom/LBufP/src/eval.jl:41 [inlined]
#188 at ./task.jl:358
unknown function (ip: 0x114fe0d8c)
jl_apply at /Users/sabae/buildbot/worker/package_macos64/build/src/./julia.h:1692 [inlined]
start_task at /Users/sabae/buildbot/worker/package_macos64/build/src/task.c:687
Allocations: 84948891 (Pool: 84929505; Big: 19386); GC: 86

Julia has exited.
Press Enter to start a new session.

So let us try a smaller problem, say with 10 nodes, and 2 edges per node.

const nvertss = 10
const dgr = random_regular_digraph(nvertss, 2)

I get the following result (which takes a long time): ssa never returns.
All this is very frustrating to say the least, since I am looking to applying this to networks on the order of 300,000 nodes. Are there better approaches? Surely, yes. But I want to give Julia shot. Could some examples be generated with 100,000 equations to get a sense of what one should get?

Question: if my libraries were precompiled before the run, why would they need to recompile again when I reenter Julia. Since the precompiled files are located somwhere under $HOME/.julia, this should not be necessary. Fixing this issue would go a long way towards improving the experience. And I do not think that this should require the help of Revise.jl.

I must not be using Gillespie.jl correctly.

Thanks.

You could always directly create just one MassActionJump with states labelled by integers. It can take vector inputs at the lowest level, see:

https://docs.sciml.ai/dev/tutorials/discrete_stochastic_example/#Defining-the-Jumps-Directly:-MassActionJump-1

I plan to play with vectorizing their creation through MT, but I don’t know if that will really give a speed up.

This bypasses ModelingToolkit and is the internal format used in the solvers. Note, you would have to write your own code for calculating some types of dependency graphs needed by SSA solvers though (for example to use RSSA, but not NRM or DirectCR).

At the end of the day though, if you want to go to sufficiently large network sizes you probably need to write a specialized SSA code that can take advantage of your having a graph of connected, identical reactions systems… This is on our TODO, but as far as I am aware no one has done such a code in Julia currently…

Gillespie.jl uses a dense matrix formulation for the setup, but of course that’s not going to scale to huge problems very well. DiffEq’s format is sparser, and I think the right way to go is to just continually optimize how we build the sparse format. There are probably some low hanging fruits around, and Julia v1.5 might give us a few benefits (and v1.6 has a very very nice update for us), but we can also do more specializations, reduce allocations and memory use, etc.

Note though, so far the issues are all in setting up through MT. I don’t think he is having issues with the solvers. So it isn’t really an SSA issue, but more on making the MT conversion process faster. I’ll do some profiling to investigate this more over the weekend.

I guess the problem with DiscreteProblem is from the two varmap_to_vars calls. I don’t see why the rest would give any issues.

I will try and implement my own algorithm as an excercise.

I did some reading on stochastic algorithms, and I believe I can write an algorithm to solve a variety of models on networks at super fast speed using faster, but exact versions, of Gillespie’s algorithm. The current algorithms in Julia are more general than we need.
My objective is 300,000 nodes and 1.5 million edges. I will satart with Anderson’s version of SSA.
I will do this in a separate code I willmwrite. If I am successful, perhaps you guys can optimize further for inclusion into existing libraries. The future functionality should apply to first-order reactions where the number of molecules is one or zero, and there are . number of reaction rates much lower than the number of reactions. This should be blazingly fast, and of use to the community.

What do you think?

Gordon

I missed a message.

" Hi @erlebach, would you mind running this and possibly profiling with
https://github.com/SciML/ModelingToolkit.jl/pull/432 ?

Eventually it should get released."
I will try running this, but I must learn how to do this. In the meantime, I have implemented an Event-based method for SIR simulation, and I hope to beat the Python library EoN, which is extremely fast. Write now, I am working on using a struct, and I want to avoid allocations. I have read a lot and the issue is quite thorny.

I would like to post benchmarking code for you all to try out, but it is longer than possibly acceptable. Is there a best practice? Thanks.

Ehh post it. I think that we could probably incorporate it into DiffEqBenchmarks to track it over time as well.

1 Like

You could always post it to gist.github.com and provide a link here if you’d prefer.

@ChrisRackauckas, @anon94023334, @isaacsas

I would like to link two versions of a FastSIR code, one with a Node Struct, and one with arrays instead (hoping to decrease memory allocation). I only gained about 15-20 percent in speed, with the same memory allocation. That was somewhat disappointing. I have my program in four files (to learn how to structure programs), and I will place them on a new github repository that I will make public. In that way, we can have a complete program. I would love to hear more about how to optimize this code. If there is a better place to post this, I am open to suggestions.

Here is a link to a git repository you can download from. The code runs in Atom. I apologize in advance for poor code structure, construction, idioms, etc as I am new to Julia.