Improving performance of solving ODESystem (Catalyst.jl)

I have an ODESystem that I evolve using DifferentialEquations.jl and would like to improve the performance. Essentially, I create a reaction network using Catalyst.jl and iterate over many (density, temperature) points. For each point, I solve the same system of equations (15 species and 83 reactions). However, for many points (the use case has ~3 million), my code becomes quite slow and performs many allocations. I’m not entirely sure what the bottleneck here is, or whether significant speed-up is possible. When solving for ~3 million points, the code takes about an hour to finish when the loop is parallelised over 4 threads.

I would also like to know how to avoid allocations in this case. My input file (density & temperature points) is 206 kB and the output file (final number densities of 15 species) is 1.3 MB. Yet a total of 369 MiB is allocated when solving the system. Is it possible to reduce this? I’ve added the relevant snippet of code and a sample @timeit output at the bottom (solved for 4200 points).

Relevant snippet:

function evolve_system(odesys, u0, tspan, p; prob=nothing)
  if (isnothing(prob))
    @timeit to "make problem" prob = ODEProblem(odesys, u0, tspan, p)
    @timeit to "auto jac" de = modelingtoolkitize(prob)
    @timeit to "autojac problem" prob = ODEProblem(de, [], tspan, jac=true)
    @timeit to "1st solve" sol = solve(prob, Rodas5(); saveeverystep=false)
    return prob, sol
    @timeit to "remake problem" prob = remake(prob; u0=u0, p=p, tspan=tspan)
    @timeit to "solve" return solve(prob, Rodas5(); saveeverystep=false)[end]

function run(to:: TimerOutput, odesys, iter, tspan, abundances)
  # 'iter' contains (density, temperature) points
  @timeit to "create array" number_densities = zeros(size(iter)[1], length(species(odesys)))
  @timeit to "main run" begin
    # First run
    @timeit to "1st calc n" n = calculate_number_densities(iter[1, 1], abundances)
    u0vals = [n[str_replace(s)] for s in species(odesys)]
    u0 = Pair.(species(odesys), u0vals)
    @timeit to "1st evolve system" prob, _ = evolve_system(odesys, u0, tspan, [iter[1, 2]]; prob=nothing)
    # Loop
    @inbounds for i in 1:size(iter)[1]
      p = [iter[i, 2]]
      @timeit to "calc n" n = calculate_number_densities(iter[i, 1], abundances)
      u0 = [n[str_replace(s)] for s in species(odesys)]
      @timeit to "evolve system" number_densities[i, :] = evolve_system(odesys, u0, tspan, p; prob=prob)

  return number_densities

function postprocess_file(infile::String, outfile::String, odesys, tspan,
    @timeit to "read rho-T file" arr = read_density_temperature_file(infile)
    @timeit to "run" n = run(to, odesys, arr, tspan, abundances)

    header = [str_replace(s) for s in species(odesys)]
    @timeit to "create table" table = Tables.table(n; header=header)
    @timeit to "write csv" CSV.write(outfile, table; delim=',')

function main(abundances::Dict, input_dir::String, output_dir::String,
  @timeit to "read network file" rn = read_network_file(network_file)
  @timeit to "setup odesys" odesys = convert(ODESystem, rn; combinatoric_ratelaws=false)
  tspan = (1e-8, 1e6)
  infile = "$(input_dir)/rho_T_test.csv"
  outfile = "$(output_dir)/catalyst_test.csv"
  @timeit to "postprocess file" postprocess_file(infile, outfile, odesys, tspan, abundances)
  @show to
to =  ────────────────────────────────────────────────────────────────────────────────
                                        Time                    Allocations      
                               ───────────────────────   ────────────────────────
       Tot / % measured:            80.0s /  86.0%           6.54GiB /  79.8%    

 Section               ncalls     time    %tot     avg     alloc    %tot      avg
 postprocess file           1    61.7s   89.8%   61.7s   4.69GiB   90.0%  4.69GiB
   run                      1    58.6s   85.3%   58.6s   4.32GiB   82.9%  4.32GiB
     main run               1    58.1s   84.6%   58.1s   4.30GiB   82.4%  4.30GiB
       1st evolve s...      1    45.4s   66.0%   45.4s   3.79GiB   72.7%  3.79GiB
         1st solve          1    29.1s   42.4%   29.1s   2.66GiB   51.0%  2.66GiB
         autojac pr...      1    6.42s    9.3%   6.42s    457MiB    8.6%   457MiB
         auto jac           1    5.40s    7.9%   5.40s    346MiB    6.5%   346MiB
         make problem       1    4.26s    6.2%   4.26s    340MiB    6.4%   340MiB
       evolve system    4.20k    11.8s   17.1%  2.80ms    381MiB    7.1%  92.8KiB
         solve          4.20k    11.5s   16.8%  2.75ms    369MiB    6.9%  90.1KiB
         remake pro...  4.20k    822μs    0.0%   196ns     0.00B    0.0%    0.00B
       1st calc n           1    532ms    0.8%   532ms   44.5MiB    0.8%  44.5MiB
       calc n           4.20k    106ms    0.2%  25.2μs   32.2MiB    0.6%  7.84KiB
     create array           1   3.50ms    0.0%  3.50ms    493KiB    0.0%   493KiB
   write csv                1    2.02s    2.9%   2.02s    311MiB    5.8%   311MiB
   read rho-T file          1    509ms    0.7%   509ms   7.39MiB    0.1%  7.39MiB
   create table             1    443ms    0.6%   443ms   54.7MiB    1.0%  54.7MiB
 setup odesys               1    4.70s    6.8%   4.70s    358MiB    6.7%   358MiB
 read network file          1    2.34s    3.4%   2.34s    177MiB    3.3%   177MiB

Thanks in advance and please let me know if I should include any other information! Happy to provide necessary files for reproducibility.

Instead of recreating and recompiling the system each time, just use remake to change the parameters of the compiled system each time and you’ll be fine.

1 Like

I remake the problem now with

remake(prob; u0=u0, p=p, tspan=tspan)

(i.e. without assigning to prob like I was doing before) but don’t see any significant improvement. The solve function still takes up a majority of allocations and time in the loop (everything that’s called 4200 times). Is it possible to speed this up?

Well first of all, that’s not an argument so it’s throwing tons of warnings at you and you should fix that to save_everystep.

But then, can you isolate the solve to just be benchmarking that?

Fixed the “save_everystep” but I wasn’t getting any warnings in the output. Not sure if I’ve suppressed them somehow? Below just the solve (not including the 1st call)
Isolating so the timer is only timing the solve itself:

to =  ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      15.6s /  92.8%            509MiB /  78.8%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 solve      4.20k    14.4s  100.0%  3.44ms    402MiB  100.0%  97.9KiB

Just the original code but only with the solve being logged:

to =  ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      83.2s /  18.2%           5.98GiB /   6.6%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg

 solve      4.20k    15.2s  100.0%  3.61ms    402MiB  100.0%  97.9KiB

And subsequent calls to it?

Those are all of the calls to solve within the file, except for the first call that does the compilation. Maybe I misunderstood the original benchmark setup?

I notice that the following example (Ordinary Differential Equations · DifferentialEquations.jl) describes creating a non-allocating functions. Following this example on my machine, I get a memory estimate of 10.81 MiB with 100,151 allocations in the standard case, and a memory estimate of 1.37 MiB with 11,745 allocations in the non-allocating case. I understand the method here; is it possible to do the same using Catalyst / ModelingToolkit?

Catalyst/ModelingToolkit build non-allocating functions. You can see this because your allocations won’t change if you turn saving off and increase the time span (i.e. longer solves don’t increase the allocations). Thus the memory cost is:

  1. Compiling new code
  2. Setting up a new solve call
  3. Saving values out

If you’re doing save_everystep = false you should be fine on (3). But are you avoiding code compilation each time? It’s only if prob !== nothing, which I haven’t looked at the code close enough to see but given you have a branch to protect against it you’re probably fine.

So then if you’ve done that correctly (and are measuring post compilation), the only other allocation is in the startup time of the solve call. You can move to the integrator interface (Integrator Interface · DifferentialEquations.jl) and just have a single integrator that reinit!s to avoid recreating the solver caches, and this would make it fully non-allocating.

But to know if that’s even required, you should try benchmarking first. Use GitHub - tkluck/StatProfilerHTML.jl: Show Julia profiling data in an explorable HTML page to create flamegraph: that will tell you what area of the code is taking all of the time.

(I’m still travelling so I haven’t gotten a chance to run it, but the flamegraph is the first thing I would make)