How can I improve the computational performance of my code (ODEProblem solving)?

I am currently building a library that can flexibly solve and optimize a variety of hydrological models (including some equations), which typically belongs to the ODEProblem. However, to ensure the flexibility of the library, I cannot build separate ODEFunction for different models. Therefore, I traverse all equations of a model to complete the computation of all input variables required to solve the ODEProblem in advance. But after building it, I found that my calculation method greatly affects the solving performance of the ODEProblem. The reference code is as follows:

# import pkg
using OrdinaryDiffEq
using Interpolations
using CSV
using DataFrames
using SciMLSensitivity
using BenchmarkTools

# read example data and interpolate them
file_path = "E:\\JlCode\\LumpedHydro.jl\\data\\exphydro\\01013500.csv"
data = CSV.File(file_path);
df = DataFrame(data);
ts = 1:10000
lday_vec = df[ts, "dayl(day)"]
prcp_vec = df[ts, "prcp(mm/day)"]
temp_vec = df[ts, "tmean(C)"]
flow_vec = df[ts, "flow(mm)"]

itp_method = SteffenMonotonicInterpolation()
itp_Lday = interpolate(ts, lday_vec, itp_method)
itp_P = interpolate(ts, prcp_vec, itp_method)
itp_T = interpolate(ts, temp_vec, itp_method)

Then there is the normal version

# this is the normal version of ODEProblem for OrdinaryDiffEq
step_fct(x) = (tanh(5.0 * x) + 1.0) * 0.5

# snow precipitation
Ps(P, T, Tmin) = step_fct(Tmin - T) * P
# rain precipitation
Pr(P, T, Tmin) = step_fct(T - Tmin) * P
# snow melt
M(S0, T, Df, Tmax) = step_fct(T - Tmax) * step_fct(S0) * minimum([S0, Df * (T - Tmax)])
# evapotranspiration
PET(T, Lday) = 29.8 * Lday * 0.611 * exp((17.3 * T) / (T + 237.3)) / (T + 273.2)
ET(S1, T, Lday, Smax) = step_fct(S1) * step_fct(S1 - Smax) * PET(T, Lday) + step_fct(S1) * step_fct(Smax - S1) * PET(T, Lday) * (S1 / Smax)
# base flow
Qb(S1, f, Smax, Qmax) = step_fct(S1) * step_fct(S1 - Smax) * Qmax + step_fct(S1) * step_fct(Smax - S1) * Qmax * exp(-f * (Smax - S1))
# peak flow
Qs(S1, Smax) = step_fct(S1) * step_fct(S1 - Smax) * (S1 - Smax)

function exp_hydro_optim_states!(dS, S, ps, t)
    f, Smax, Qmax, Df, Tmax, Tmin = ps

    Lday = itp_Lday(t)
    P = itp_P(t)
    T = itp_T(t)

    Q_out = Qb(S[2], f, Smax, Qmax) + Qs(S[2], Smax)

    dS[1] = Ps(P, T, Tmin) - M(S[1], T, Df, Tmax)
    dS[2] = Pr(P, T, Tmin) + M(S[1], T, Df, Tmax) - ET(S[2], T, Lday, Smax) - Q_out
end

and a modified version to adapt to flexible solutions.

step_fct(x) = (tanh(5.0 * x) + 1.0) * 0.5

# snow precipitation
Ps(i, p) = step_fct(p[:Tmin] - i[:T]) * i[:P]

# rain precipitation
Pr(i, p) = step_fct(i[:T] - p[:Tmin]) * i[:P]

# snow melt
M(i, p) = step_fct(i[:T] - p[:Tmax]) * step_fct(i[:S0]) * minimum([i[:S0], p[:Df] * (i[:T] - p[:Tmax])])

# evapotranspiration
PET(i, p) = 29.8 * i[:Lday] * 0.611 * exp((17.3 * i[:T]) / (i[:T] + 237.3)) / (i[:T] + 273.2)
ET(i, p) = step_fct(i[:S1]) * step_fct(i[:S1] - p[:Smax]) * PET(i, p) + step_fct(i[:S1]) * step_fct(p[:Smax] - i[:S1]) * PET(i, p) * (i[:S1] / p[:Smax])

# base flow
Qb(i, p) = step_fct(i[:S1]) * step_fct(i[:S1] - p[:Smax]) * p[:Qmax] + step_fct(i[:S1]) * step_fct(p[:Smax] - i[:S1]) * p[:Qmax] * exp(-p[:f] * (p[:Smax] - i[:S1]))

# peak flow
Qs(i, p) = step_fct(i[:S1]) * step_fct(i[:S1] - p[:Smax]) * (i[:S1] - p[:Smax])
func_list = [Ps, Pr, M, PET, ET, Qb, Qs]

function exp_hydro_optim_states!(dS, S, ps, t)
    p = namedtuple([:f, :Smax, :Qmax, :Df, :Tmax, :Tmin], collect(ps))

    Lday = itp_Lday(t)
    P = itp_P(t)
    T = itp_T(t)
    fluxes = (Lday=Lday, P=P, T=T, S0=S[1], S1=S[2])

    for func in func_list
        tmp_fluxes = namedtuple([nameof(func)], [func(fluxes, p)])
        fluxes = merge(fluxes, tmp_fluxes)
    end

    dS[1] = fluxes[:Ps] - fluxes[:M]
    dS[2] = fluxes[:Pr] + fluxes[:M] - fluxes[:ET] - fluxes[:Qb] - fluxes[:Qs]
end

In this modified code, I use namedtuple to store and continuously update the calculated variables, but this seems to seriously affect the performance of ODEProblem solving, see the performance comparison below.

t_out = 1:10000
prob = ODEProblem(exp_hydro_optim_states!, [0.0, 1303.004248], Float64.((t_out[1], maximum(t_out))))

@btime sol = solve(prob, Tsit5(), u0=[0.0, 1303.004248],
    p=[0.01674478, 1709.461015, 18.46996175, 2.674548848, 0.175739196, -2.092959084],
    saveat=t_out, reltol=1e-3, abstol=1e-3, sensealg=ForwardDiffSensitivity())

The computing performance of the normal version is:

14.956 ms (1006304 allocations: 20.71 MiB)

The modified version is:

451.091 ms (5540900 allocations: 180.06 MiB)

I would like to ask if there is a better solution here that ensures both code flexibility and, as much as possible, improves computational performance.
PS: I have tried building the ODEProblem using ModelingToolkit.jl, which does indeed solve this problem, but it has some limitations in parameter optimization.

The test data can be seen in GitHub - chooron/CamelsDataForTest

All of your interpolation functions are non-constant globals. The type instability from that will be a huge hit to performance. You need to do the standard basic optimizations first.

Hello, thank you for your response. However, it seems that there is no significant difference in the interpolation method between these two versions of the code, yet there is a considerable change in performance. In the modified version, the function is continuously traversed to obtain the variables required to solve for du. During this process, a namedtuple is used to store the data, which may affect the performance of the solution. Therefore, I would like to inquire if there is a better implementation for this situation.

Doing the named tuples like that would not be type stable and is allocating a bunch. Again, start by making a well-typed non-global non-allocating code, and then do other things. There’s just a ton of very simple optimizations to start with in this example before doing anything fancy.

All of your interpolation functions are non-constant globals. The type instability from that will be a huge hit to performance. You need to do the standard basic optimizations first.

This suggestion can reduce the computation time from 14ms to 5ms, which indeed plays a very good role in optimization. Thank you for your suggestion.

Doing the named tuples like that would not be type stable and is allocating a bunch.

Perhaps I should avoid using the namedtuple type. I will carefully read the content about Julia performance optimization before I build my package.

From me ten days later, because a considerable number of intermediate variables will be generated during the calculation of du, using the NamedTuple type to continuously save and update the state will greatly affect the efficiency of solving the model, resulting in a very long running time. In the past few days, I have found a feasible alternative method (without using ModelingToolkit.jl). There is a library in Julia called Symbolics.jl, which is an important library supporting ModelingToolkit.jl. At the same time, we can construct the calculation equation through Symbolics.jl, just like constructing the equation with ModelingToolkit.jl. The difference is that here I used the build_function method in Symbolics.jl, and obtained a simplified equation for calculating du without intermediate variables through substitute. Then, I manually constructed ode_func! and solved it, which can significantly improve the computational performance.

Yup that’s usually a simple way to solve some of these issues.