# Improving performance of solving ODESystem (Catalyst.jl)

Hello,
I have an ODESystem that I evolve using DifferentialEquations.jl and would like to improve the performance. Essentially, I create a reaction network using Catalyst.jl and iterate over many (density, temperature) points. For each point, I solve the same system of equations (15 species and 83 reactions). However, for many points (the use case has ~3 million), my code becomes quite slow and performs many allocations. I’m not entirely sure what the bottleneck here is, or whether significant speed-up is possible. When solving for ~3 million points, the code takes about an hour to finish when the loop is parallelised over 4 threads.

I would also like to know how to avoid allocations in this case. My input file (density & temperature points) is 206 kB and the output file (final number densities of 15 species) is 1.3 MB. Yet a total of 369 MiB is allocated when solving the system. Is it possible to reduce this? I’ve added the relevant snippet of code and a sample @timeit output at the bottom (solved for 4200 points).

Relevant snippet:

``````function evolve_system(odesys, u0, tspan, p; prob=nothing)
if (isnothing(prob))
@timeit to "make problem" prob = ODEProblem(odesys, u0, tspan, p)
@timeit to "auto jac" de = modelingtoolkitize(prob)
@timeit to "autojac problem" prob = ODEProblem(de, [], tspan, jac=true)
@timeit to "1st solve" sol = solve(prob, Rodas5(); saveeverystep=false)
return prob, sol
else
@timeit to "remake problem" prob = remake(prob; u0=u0, p=p, tspan=tspan)
@timeit to "solve" return solve(prob, Rodas5(); saveeverystep=false)[end]
end
end

function run(to:: TimerOutput, odesys, iter, tspan, abundances)
# 'iter' contains (density, temperature) points
@timeit to "create array" number_densities = zeros(size(iter)[1], length(species(odesys)))
@timeit to "main run" begin
# First run
@timeit to "1st calc n" n = calculate_number_densities(iter[1, 1], abundances)
u0vals = [n[str_replace(s)] for s in species(odesys)]
u0 = Pair.(species(odesys), u0vals)
@timeit to "1st evolve system" prob, _ = evolve_system(odesys, u0, tspan, [iter[1, 2]]; prob=nothing)
# Loop
@inbounds for i in 1:size(iter)[1]
p = [iter[i, 2]]
@timeit to "calc n" n = calculate_number_densities(iter[i, 1], abundances)
u0 = [n[str_replace(s)] for s in species(odesys)]
@timeit to "evolve system" number_densities[i, :] = evolve_system(odesys, u0, tspan, p; prob=prob)
end
end

return number_densities
end

function postprocess_file(infile::String, outfile::String, odesys, tspan,
abundances::Dict)
@timeit to "run" n = run(to, odesys, arr, tspan, abundances)

header = [str_replace(s) for s in species(odesys)]
@timeit to "write csv" CSV.write(outfile, table; delim=',')
end

function main(abundances::Dict, input_dir::String, output_dir::String,
network_file::String)
@timeit to "setup odesys" odesys = convert(ODESystem, rn; combinatoric_ratelaws=false)
tspan = (1e-8, 1e6)
infile = "\$(input_dir)/rho_T_test.csv"
outfile = "\$(output_dir)/catalyst_test.csv"
@timeit to "postprocess file" postprocess_file(infile, outfile, odesys, tspan, abundances)
@show to
println()
end
``````
``````to =  ────────────────────────────────────────────────────────────────────────────────
Time                    Allocations
───────────────────────   ────────────────────────
Tot / % measured:            80.0s /  86.0%           6.54GiB /  79.8%

Section               ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────
postprocess file           1    61.7s   89.8%   61.7s   4.69GiB   90.0%  4.69GiB
run                      1    58.6s   85.3%   58.6s   4.32GiB   82.9%  4.32GiB
main run               1    58.1s   84.6%   58.1s   4.30GiB   82.4%  4.30GiB
1st evolve s...      1    45.4s   66.0%   45.4s   3.79GiB   72.7%  3.79GiB
1st solve          1    29.1s   42.4%   29.1s   2.66GiB   51.0%  2.66GiB
autojac pr...      1    6.42s    9.3%   6.42s    457MiB    8.6%   457MiB
auto jac           1    5.40s    7.9%   5.40s    346MiB    6.5%   346MiB
make problem       1    4.26s    6.2%   4.26s    340MiB    6.4%   340MiB
evolve system    4.20k    11.8s   17.1%  2.80ms    381MiB    7.1%  92.8KiB
solve          4.20k    11.5s   16.8%  2.75ms    369MiB    6.9%  90.1KiB
remake pro...  4.20k    822μs    0.0%   196ns     0.00B    0.0%    0.00B
1st calc n           1    532ms    0.8%   532ms   44.5MiB    0.8%  44.5MiB
calc n           4.20k    106ms    0.2%  25.2μs   32.2MiB    0.6%  7.84KiB
create array           1   3.50ms    0.0%  3.50ms    493KiB    0.0%   493KiB
write csv                1    2.02s    2.9%   2.02s    311MiB    5.8%   311MiB
read rho-T file          1    509ms    0.7%   509ms   7.39MiB    0.1%  7.39MiB
create table             1    443ms    0.6%   443ms   54.7MiB    1.0%  54.7MiB
setup odesys               1    4.70s    6.8%   4.70s    358MiB    6.7%   358MiB
read network file          1    2.34s    3.4%   2.34s    177MiB    3.3%   177MiB
────────────────────────────────────────────────────────────────────────────────
``````

Thanks in advance and please let me know if I should include any other information! Happy to provide necessary files for reproducibility.

Instead of recreating and recompiling the system each time, just use `remake` to change the parameters of the compiled system each time and you’ll be fine.

1 Like

I `remake` the problem now with

``````remake(prob; u0=u0, p=p, tspan=tspan)
``````

(i.e. without assigning to `prob` like I was doing before) but don’t see any significant improvement. The `solve` function still takes up a majority of allocations and time in the loop (everything that’s called 4200 times). Is it possible to speed this up?

Well first of all, that’s not an argument so it’s throwing tons of warnings at you and you should fix that to `save_everystep`.

But then, can you isolate the `solve` to just be benchmarking that?

Fixed the “save_everystep” but I wasn’t getting any warnings in the output. Not sure if I’ve suppressed them somehow? Below just the `solve` (not including the 1st call)
Isolating so the timer is only timing the solve itself:

``````to =  ────────────────────────────────────────────────────────────────────
Time                    Allocations
───────────────────────   ────────────────────────
Tot / % measured:      15.6s /  92.8%            509MiB /  78.8%

Section   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────
solve      4.20k    14.4s  100.0%  3.44ms    402MiB  100.0%  97.9KiB
────────────────────────────────────────────────────────────────────
``````

Just the original code but only with the solve being logged:

``````to =  ────────────────────────────────────────────────────────────────────
Time                    Allocations
───────────────────────   ────────────────────────
Tot / % measured:      83.2s /  18.2%           5.98GiB /   6.6%

Section   ncalls     time    %tot     avg     alloc    %tot      avg

────────────────────────────────────────────────────────────────────
solve      4.20k    15.2s  100.0%  3.61ms    402MiB  100.0%  97.9KiB
────────────────────────────────────────────────────────────────────
``````

And subsequent calls to it?

Those are all of the calls to `solve` within the file, except for the first call that does the compilation. Maybe I misunderstood the original benchmark setup?

I notice that the following example (Ordinary Differential Equations · DifferentialEquations.jl) describes creating a non-allocating functions. Following this example on my machine, I get a memory estimate of 10.81 MiB with 100,151 allocations in the standard case, and a memory estimate of 1.37 MiB with 11,745 allocations in the non-allocating case. I understand the method here; is it possible to do the same using Catalyst / ModelingToolkit?

Catalyst/ModelingToolkit build non-allocating functions. You can see this because your allocations won’t change if you turn saving off and increase the time span (i.e. longer solves don’t increase the allocations). Thus the memory cost is:

1. Compiling new code
2. Setting up a new solve call
3. Saving values out

If you’re doing `save_everystep = false` you should be fine on (3). But are you avoiding code compilation each time? It’s only if `prob !== nothing`, which I haven’t looked at the code close enough to see but given you have a branch to protect against it you’re probably fine.

So then if you’ve done that correctly (and are measuring post compilation), the only other allocation is in the startup time of the `solve` call. You can move to the integrator interface (Integrator Interface · DifferentialEquations.jl) and just have a single integrator that `reinit!`s to avoid recreating the solver caches, and this would make it fully non-allocating.

But to know if that’s even required, you should try benchmarking first. Use GitHub - tkluck/StatProfilerHTML.jl: Show Julia profiling data in an explorable HTML page to create flamegraph: that will tell you what area of the code is taking all of the time.

(I’m still travelling so I haven’t gotten a chance to run it, but the flamegraph is the first thing I would make)