Unexpected allocations after re-including unchanged file in ODE simulation

Hello!

I’m running a large number of parallel Monte Carlo simulations, so I’m working to minimize memory allocations in my code. While doing this, I encountered some behavior I don’t fully understand.

I’m using Julia Version 1.11.6 and OrdinaryDiffEq v6.101.0.

Consider the following code:

using Parameters, OrdinaryDiffEq;
const DEIntegrator = OrdinaryDiffEq.OrdinaryDiffEqCore.ODEIntegrator;

include("functions.jl");

solver_params = generate_solver_params();
integrator = get_integrator(solver_params);

@time output1 = solve_ode!(integrator, solver_params);
reinit!(integrator);
@time output1 = solve_ode!(integrator, solver_params);

include("functions.jl");

reinit!(integrator);
@time output2 = solve_ode!(integrator, solver_params);
reinit!(integrator);
@time output2 = solve_ode!(integrator, solver_params);

On the first run, the second @time shows zero allocations, as expected for compiled code. However, after re-including functions.jl —without making any changes to the file—subsequent @time calls show unexpected allocations after compilation (see the fourth @time):

0.023522 seconds (5.08 k allocations: 262.219 KiB, 19.72% compilation time)
0.020661 seconds
0.028323 seconds (85.08 k allocations: 4.528 MiB, 15.60% compilation time)
0.025087 seconds (80.00 k allocations: 4.272 MiB)

The content of functions.jl is:

function get_initial_condition(total_dim::Int)
    x0=ones(Float64, total_dim)
    return x0
end

function generate_solver_params()
    total_dim::Int = 1000               # Dimension of the system
    t_end::Float64 = 10.0               # End time
    n_tot::Int = 10_000                 # Number of time steps
    ts_all = range(start=0.0, stop=t_end, length=n_tot+1)
    dt::Float64 = ts_all[2]-ts_all[1]   # Time step size
    decay_rate::Float64 = 1.0           # Constant decay rate
    return (; total_dim, t_end, n_tot, ts_all, dt, decay_rate)
end

function rhs!(dx::Vector{Float64}, x::Vector{Float64}, solver_params::P, t::Float64) where {P<:NamedTuple}
    @unpack decay_rate = solver_params
    dx .= -decay_rate .* x
    return nothing
end

function get_integrator(solver_params::P) where {P<:NamedTuple}
    @unpack total_dim,t_end,n_tot,ts_all,dt,decay_rate = solver_params
    prob = ODEProblem(rhs!, get_initial_condition(total_dim), (0.0, t_end), solver_params)
    integrator = init(prob, RK4();
        adaptive=false,
        dt=dt,
        dense=false,
        save_everystep=false,
        save_on=false,
        save_start=false,
        save_end=false)
    return integrator
end

function solve_ode!(integrator::I, solver_params::P) where {I<:DEIntegrator, P<:NamedTuple}
    @unpack n_tot, dt = solver_params
    for _ in 1:n_tot
        step!(integrator, dt)
    end
    return integrator.u
end

I suspect this behavior is related to Julia’s method invalidation or the way functions are recompiled when re-evaluated. Re-including functions.jl redefines rhs!, which likely triggers recompilation or invalidates previously compiled method specializations, leading to unexpected allocations.

When I explicitly wrap rhs! using ODEFunction{true}, specifying that it is in-place, the issue disappears:

prob = ODEProblem(ODEFunction{true}(rhs!), get_initial_condition(total_dim), (0.0, t_end), solver_params)

After this change, allocations remain at zero even after re-including functions.jl :

  0.487818 seconds (3.92 M allocations: 219.215 MiB, 8.36% gc time, 96.45% compilation time)
  0.017568 seconds
  0.301034 seconds (2.28 M allocations: 126.925 MiB, 2.18% gc time, 94.06% compilation time)
  0.021564 seconds

However, compilation takes significantly longer and consumes considerably more memory.

My questions are the following:

  1. What exactly causes the additional allocations when the wrapper is not used after re-including the file?
  2. Is the solution using ODEFunction{true} a universal and safe approach?
  3. In my actual application, rhs! is implemented as a functor with externally modified mutable fields. I’m concerned about potential memory issues or unintended behavior resulting from method invalidation or recompilation. Is explicitly wrapping the rhs! functor sufficient to avoid these problems in such cases?

Thank you for your time and any insights you can share!

1 Like

Recompilation isn’t the reason, as suggested by the 4th run retaining 80k allocations after the 3rd recompiling run; if you subtract that 80k from the 3rd run with compilation time, you also get the same 5.08k as the 1st run’s initial compilation of everything in functions.jl. You can try replacing the include(“functions.jl”) with evaluating individual methods, and you’ll find that 3/5 of them don’t affect the 3rd and 4th runtimes at all. Invalidating solve_ode! gives the 3rd run 5.08k allocations like the 1st run, which is no surprise because the benchmark call compiles the “new” method. As you intuited, invalidating rhs! (expand example.jl below) contributes 80k allocations, but no recompilation is reported (not sure why it doesn’t report even a small amount because it should get called indirectly via solve_ode!(integrator,…)) and it occurs in every run:

example.jl
using Parameters, OrdinaryDiffEq;
const DEIntegrator = OrdinaryDiffEq.OrdinaryDiffEqCore.ODEIntegrator;

include("functions.jl");

solver_params = generate_solver_params();
integrator = get_integrator(solver_params);

@time output1 = solve_ode!(integrator, solver_params);
reinit!(integrator);
@time output1 = solve_ode!(integrator, solver_params);

function rhs!(dx::Vector{Float64}, x::Vector{Float64}, solver_params::P, t::Float64) where {P<:NamedTuple}
    @unpack decay_rate = solver_params
    dx .= -decay_rate .* x
    return nothing
end

reinit!(integrator);
@time output2 = solve_ode!(integrator, solver_params);
reinit!(integrator);
@time output2 = solve_ode!(integrator, solver_params);

which repeatably prints:

julia> include("example.jl");
  0.025145 seconds (5.08 k allocations: 263.094 KiB, 24.87% compilation time)
  0.019740 seconds
  0.022698 seconds (80.00 k allocations: 4.272 MiB)
  0.021038 seconds (80.00 k allocations: 4.272 MiB)

julia> include("example.jl");
  0.025133 seconds (5.08 k allocations: 261.906 KiB, 22.64% compilation time)
  0.019422 seconds
  0.021330 seconds (80.00 k allocations: 4.272 MiB)
  0.021140 seconds (80.00 k allocations: 4.272 MiB)

julia> reinit!(integrator); @time output3 = solve_ode!(integrator, solver_params); # EVERY run
  0.023598 seconds (80.00 k allocations: 4.272 MiB)

You see how the 2nd include’s 1st run is really the 5th run, but it got rid of the 80k overhead? That happens for your original demo as well, so let’s look at the lines you didn’t repeat:

solver_params = generate_solver_params();
integrator = get_integrator(solver_params);

So integrator got reinstantiated, and notably, get_integrator passes the problematic rhs! into ODEProblem. So let’s replace the 1streinit! after the reevaluation of rhs! with get_integrator.

example2.jl
using Parameters, OrdinaryDiffEq;
const DEIntegrator = OrdinaryDiffEq.OrdinaryDiffEqCore.ODEIntegrator;

include("functions.jl");

solver_params = generate_solver_params();
integrator = get_integrator(solver_params);

@time output1 = solve_ode!(integrator, solver_params);
reinit!(integrator);
@time output1 = solve_ode!(integrator, solver_params);

function rhs!(dx::Vector{Float64}, x::Vector{Float64}, solver_params::P, t::Float64) where {P<:NamedTuple}
    @unpack decay_rate = solver_params
    dx .= -decay_rate .* x
    return nothing
end

integrator = get_integrator(solver_params);
@time output2 = solve_ode!(integrator, solver_params);
reinit!(integrator);
@time output2 = solve_ode!(integrator, solver_params);

julia> include("example2.jl");
  0.028315 seconds (5.08 k allocations: 262.688 KiB, 23.12% compilation time)
  0.023090 seconds
  0.021286 seconds
  0.021221 seconds

julia> include("example2.jl");
  0.025580 seconds (5.08 k allocations: 261.906 KiB, 23.09% compilation time)
  0.019593 seconds
  0.019376 seconds
  0.019465 seconds

The 80k allocations are gone, and the critical difference is reinstantiatingintegratorafter reevaluating therhs!method.

Inspecting the monstrously huge structs of integratorshows that the ODEFunction{true} edit makes a significant difference in the structure and what gets compiled. Something that stood out to me was it contained rhs! directly (the FullSpecialize option) instead of being wrapped in FunctionWrappers and FunctionWrappersWrappers(the AutoSpecialize option). When I tried this edit with example.jl or example2.jl where only rhs! is reevaluated, the 3rd run does report recompilation, and to the same increased degree you observed.

julia> include("example.jl");
  1.423268 seconds (2.29 M allocations: 126.185 MiB, 23.36% gc time, 98.13% compilation time)
  0.020935 seconds
  0.547311 seconds (2.29 M allocations: 126.184 MiB, 7.61% gc time, 95.97% compilation time: 100% of which was recompilation)
  0.019844 seconds

julia> include("example2.jl");
  1.388704 seconds (2.29 M allocations: 126.185 MiB, 35.83% gc time, 95.95% compilation time)
  0.040224 seconds
  0.837534 seconds (2.29 M allocations: 126.184 MiB, 5.38% gc time, 97.21% compilation time: 100% of which was recompilation)
  0.022469 seconds

My guess as to why recompilation increased is that FunctionWrappers only points to compiled code of a contained function and isn’t parameterized by it, so the most it can do is recompile that one isolated function call, but embedding rhs! parameterizes large swathes of integrator’s type and thus solver methods’ call signatures, opening up inlining optimizations. Of course, that doesn’t explain why the original demo didn’t report any recompilation at all.

As for the original demo’s 80k overhead, FunctionWrappers is supposed to seamlessly adapt to method changes, and FunctionWrappersWrappers doesn’t appear to tack on anything weird, but maybe something went wrong somewhere. I personally don’t know how FunctionWrappers adapts to method changes and caches dispatch when by all appearances, the instance doesn’t change its function pointer. It’s not urgent given that reevaluating the same code is unnecessary and there are a couple ways to get around this weird overhead, but I think it’s worth opening an issue with this fairly minimal example to get real answers.

2 Likes

Some of this was addressed, but:

Most of it is building a new integrator which builds the caches.

Yes, well, if you use an in-place function then ODEFunction{true} is good, and if you use f(u,p,t) then you’d need to ODEFunction{false}. Skipping the auto-detection is fine and, for now (as of Julia v1.9) it improves type-stability (until the handling in the compiler is improved to return to the state we had with @pure, details details).

That’s fine as long as you’re not threading. If you’re threading then you need to be careful that each parallel simulation has a different instantition of the rhs! so they don’t mutate the same memory.

1 Like