Optimising allocations and memory use in ODEProblems

markjholmes · September 17, 2024, 6:31pm

If I have an ODEProblem with many lines of computation going on at each step, when (if ever) does it become sensible to pre-allocate the space for these computations inside the parameters, p?

With a simplified example, we compare two versions of a function:

sys! computes the matrix-vector product a * u from scratch every time
sys2! has a parameter b which is updated with the value of a * u every step

using DifferentialEquations

# number of state variables, initial conditions, and time
n_state = 10
u0 = rand(n_state)
tspan = (0.0, 10.0)

# system 1 has to allocate matrix-vector product every time
function sys!(du, u, p, t)
    du[1:p.n] .= u .* (p.r .- p.a * u)
    return nothing
end

# system 1 parameters
p = (n = n_state, r = rand(n_state), a = 0.1rand(n_state, n_state))

# put into problem
prob = ODEProblem(sys!, u0, tspan, p)

# system 2 already has b inside parameters and just overwrites with matrix-vector product
function sys2!(du, u, p, t)
    p.b[1:p.n] .= p.a * u # now it's going into p.b instead of its own thing
    @. du[1:p.n] = u * (p.r - p.b) # i can also use @. now, but doesn't affect time 
    return nothing
end

# same parameters, just extra empty b
p2 = merge(p, (b = zeros(n_state),)) 

prob2 = ODEProblem(sys2!, u0, tspan, p2)

I end up using 15% less time and memory the second way:

using BenchmarkTools

@benchmark solve(prob, abstol = 1e-6, reltol = 1e-6)
# Time (median): 67.354 μs, Memory estimate: 113.14 KiB, allocs estimate: 1034.

@benchmark solve(prob2, abstol = 1e-6, reltol = 1e-6)
# Time (median): 56.990 μs, Memory estimate: 99.48 KiB, allocs estimate: 888.

For larger, more complicated functions, does this mean that every computation that takes place within an ODEProblem step should be preallocated? Is p really the wisest place to store this? Should I be preallocating indices too, i.e. putting ids = 1:n_state inside p? Why does the difference disappear when I compare just the sys! and sys2! function benchmarks directly (without the ODEProblem around them)?

Any answers would be great, thanks!

Edit: I’ve restarted my REPL and now the difference is gone, but my question still remains, even if my tests are dumb. Is it ever wise to preallocate and modify-in-place all these computations that happen within the function or is the compiler far smarter than me?

JonasWickman · September 17, 2024, 7:53pm

Here are a few things to consider regarding when to preallocate, and the possible complications it can lead to.

Getting preallocation right can be a bit tricky. Case in point, your sys2! function still allocates, because p.a * u allocates a new array before it’s stored in p.b. To get rid of the allocation, you could write the function like this, for example:

using LinearAlgebra
function sys2!(du, u, p, t)
    mul!(p.b, p.a, u)
    @. du[1:p.n] = u * (p.r - p.b) 
    return nothing
end

where mul! from the LinearAlgebra standard library multiplies p.a with u without allocating a temporary array.

On a related trickyness note, some ODE solvers in DifferentialEquations.jl will not automatically play nice with preallocation, and you’ll have to use PreallocationTools.jl to get it to work properly.

For small systems, you can use StaticArrays.jl, which don’t allocate and are very fast. The ballpark for when this is good is when an array contains less than about 100 elements (so no bigger than a 10x10 matrix).
The parameter p is probably your best option for putting arrays for temporary allocation. Another option is to write something like:

my_rhs_fun!(du, u, (p, cache), t)
...
end
# Code for setting up the cache goes here
ode_prob = ODEProblem{true}(my_rhs_fun!, u0, t_span, (p, cache))

Depending on your problem, avoiding allocations can save quite a lot of time, but (at least in my experience), it can be quite tricky to figure out when.

markjholmes · September 18, 2024, 9:14am

This is great stuff, thanks Jonas!

StaticArrays unfortunately isn’t an option in my system due to the size but stuff like mul! should help. I really like the idea of keeping the cache separate, as you showed. I didn’t know that was possible and playing around with the parameters makes me nervous as it seems destined for typos and errors. I’ll definitely have a look into PreallocationTools too