# Improving performance of solving ODESystem (Catalyst.jl)

Hello,
I have an ODESystem that I evolve using DifferentialEquations.jl and would like to improve the performance. Essentially, I create a reaction network using Catalyst.jl and iterate over many (density, temperature) points. For each point, I solve the same system of equations (15 species and 83 reactions). However, for many points (the use case has ~3 million), my code becomes quite slow and performs many allocations. Iβm not entirely sure what the bottleneck here is, or whether significant speed-up is possible. When solving for ~3 million points, the code takes about an hour to finish when the loop is parallelised over 4 threads.

I would also like to know how to avoid allocations in this case. My input file (density & temperature points) is 206 kB and the output file (final number densities of 15 species) is 1.3 MB. Yet a total of 369 MiB is allocated when solving the system. Is it possible to reduce this? Iβve added the relevant snippet of code and a sample @timeit output at the bottom (solved for 4200 points).

Relevant snippet:

``````function evolve_system(odesys, u0, tspan, p; prob=nothing)
if (isnothing(prob))
@timeit to "make problem" prob = ODEProblem(odesys, u0, tspan, p)
@timeit to "auto jac" de = modelingtoolkitize(prob)
@timeit to "autojac problem" prob = ODEProblem(de, [], tspan, jac=true)
@timeit to "1st solve" sol = solve(prob, Rodas5(); saveeverystep=false)
return prob, sol
else
@timeit to "remake problem" prob = remake(prob; u0=u0, p=p, tspan=tspan)
@timeit to "solve" return solve(prob, Rodas5(); saveeverystep=false)[end]
end
end

function run(to:: TimerOutput, odesys, iter, tspan, abundances)
# 'iter' contains (density, temperature) points
@timeit to "create array" number_densities = zeros(size(iter)[1], length(species(odesys)))
@timeit to "main run" begin
# First run
@timeit to "1st calc n" n = calculate_number_densities(iter[1, 1], abundances)
u0vals = [n[str_replace(s)] for s in species(odesys)]
u0 = Pair.(species(odesys), u0vals)
@timeit to "1st evolve system" prob, _ = evolve_system(odesys, u0, tspan, [iter[1, 2]]; prob=nothing)
# Loop
@inbounds for i in 1:size(iter)[1]
p = [iter[i, 2]]
@timeit to "calc n" n = calculate_number_densities(iter[i, 1], abundances)
u0 = [n[str_replace(s)] for s in species(odesys)]
@timeit to "evolve system" number_densities[i, :] = evolve_system(odesys, u0, tspan, p; prob=prob)
end
end

return number_densities
end

function postprocess_file(infile::String, outfile::String, odesys, tspan,
abundances::Dict)
@timeit to "run" n = run(to, odesys, arr, tspan, abundances)

header = [str_replace(s) for s in species(odesys)]
@timeit to "write csv" CSV.write(outfile, table; delim=',')
end

function main(abundances::Dict, input_dir::String, output_dir::String,
network_file::String)
@timeit to "setup odesys" odesys = convert(ODESystem, rn; combinatoric_ratelaws=false)
tspan = (1e-8, 1e6)
infile = "\$(input_dir)/rho_T_test.csv"
outfile = "\$(output_dir)/catalyst_test.csv"
@timeit to "postprocess file" postprocess_file(infile, outfile, odesys, tspan, abundances)
@show to
println()
end
``````
``````to =  ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time                    Allocations
βββββββββββββββββββββββ   ββββββββββββββββββββββββ
Tot / % measured:            80.0s /  86.0%           6.54GiB /  79.8%

Section               ncalls     time    %tot     avg     alloc    %tot      avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
postprocess file           1    61.7s   89.8%   61.7s   4.69GiB   90.0%  4.69GiB
run                      1    58.6s   85.3%   58.6s   4.32GiB   82.9%  4.32GiB
main run               1    58.1s   84.6%   58.1s   4.30GiB   82.4%  4.30GiB
1st evolve s...      1    45.4s   66.0%   45.4s   3.79GiB   72.7%  3.79GiB
1st solve          1    29.1s   42.4%   29.1s   2.66GiB   51.0%  2.66GiB
autojac pr...      1    6.42s    9.3%   6.42s    457MiB    8.6%   457MiB
auto jac           1    5.40s    7.9%   5.40s    346MiB    6.5%   346MiB
make problem       1    4.26s    6.2%   4.26s    340MiB    6.4%   340MiB
evolve system    4.20k    11.8s   17.1%  2.80ms    381MiB    7.1%  92.8KiB
solve          4.20k    11.5s   16.8%  2.75ms    369MiB    6.9%  90.1KiB
remake pro...  4.20k    822ΞΌs    0.0%   196ns     0.00B    0.0%    0.00B
1st calc n           1    532ms    0.8%   532ms   44.5MiB    0.8%  44.5MiB
calc n           4.20k    106ms    0.2%  25.2ΞΌs   32.2MiB    0.6%  7.84KiB
create array           1   3.50ms    0.0%  3.50ms    493KiB    0.0%   493KiB
write csv                1    2.02s    2.9%   2.02s    311MiB    5.8%   311MiB
read rho-T file          1    509ms    0.7%   509ms   7.39MiB    0.1%  7.39MiB
create table             1    443ms    0.6%   443ms   54.7MiB    1.0%  54.7MiB
setup odesys               1    4.70s    6.8%   4.70s    358MiB    6.7%   358MiB
read network file          1    2.34s    3.4%   2.34s    177MiB    3.3%   177MiB
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
``````

Thanks in advance and please let me know if I should include any other information! Happy to provide necessary files for reproducibility.

Instead of recreating and recompiling the system each time, just use `remake` to change the parameters of the compiled system each time and youβll be fine.

1 Like

I `remake` the problem now with

``````remake(prob; u0=u0, p=p, tspan=tspan)
``````

(i.e. without assigning to `prob` like I was doing before) but donβt see any significant improvement. The `solve` function still takes up a majority of allocations and time in the loop (everything thatβs called 4200 times). Is it possible to speed this up?

Well first of all, thatβs not an argument so itβs throwing tons of warnings at you and you should fix that to `save_everystep`.

But then, can you isolate the `solve` to just be benchmarking that?

Fixed the βsave_everystepβ but I wasnβt getting any warnings in the output. Not sure if Iβve suppressed them somehow? Below just the `solve` (not including the 1st call)
Isolating so the timer is only timing the solve itself:

``````to =  ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time                    Allocations
βββββββββββββββββββββββ   ββββββββββββββββββββββββ
Tot / % measured:      15.6s /  92.8%            509MiB /  78.8%

Section   ncalls     time    %tot     avg     alloc    %tot      avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
solve      4.20k    14.4s  100.0%  3.44ms    402MiB  100.0%  97.9KiB
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
``````

Just the original code but only with the solve being logged:

``````to =  ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time                    Allocations
βββββββββββββββββββββββ   ββββββββββββββββββββββββ
Tot / % measured:      83.2s /  18.2%           5.98GiB /   6.6%

Section   ncalls     time    %tot     avg     alloc    %tot      avg

ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
solve      4.20k    15.2s  100.0%  3.61ms    402MiB  100.0%  97.9KiB
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
``````

And subsequent calls to it?

Those are all of the calls to `solve` within the file, except for the first call that does the compilation. Maybe I misunderstood the original benchmark setup?

I notice that the following example (Ordinary Differential Equations Β· DifferentialEquations.jl) describes creating a non-allocating functions. Following this example on my machine, I get a memory estimate of 10.81 MiB with 100,151 allocations in the standard case, and a memory estimate of 1.37 MiB with 11,745 allocations in the non-allocating case. I understand the method here; is it possible to do the same using Catalyst / ModelingToolkit?

Catalyst/ModelingToolkit build non-allocating functions. You can see this because your allocations wonβt change if you turn saving off and increase the time span (i.e. longer solves donβt increase the allocations). Thus the memory cost is:

1. Compiling new code
2. Setting up a new solve call
3. Saving values out

If youβre doing `save_everystep = false` you should be fine on (3). But are you avoiding code compilation each time? Itβs only if `prob !== nothing`, which I havenβt looked at the code close enough to see but given you have a branch to protect against it youβre probably fine.

So then if youβve done that correctly (and are measuring post compilation), the only other allocation is in the startup time of the `solve` call. You can move to the integrator interface (Integrator Interface Β· DifferentialEquations.jl) and just have a single integrator that `reinit!`s to avoid recreating the solver caches, and this would make it fully non-allocating.

But to know if thatβs even required, you should try benchmarking first. Use GitHub - tkluck/StatProfilerHTML.jl: Show Julia profiling data in an explorable HTML page to create flamegraph: that will tell you what area of the code is taking all of the time.

(Iβm still travelling so I havenβt gotten a chance to run it, but the flamegraph is the first thing I would make)