DifferentialEquations.jl: running out of memory with large system size ODE

Joh11 · April 15, 2021, 4:32pm

Hi everyone,

I need to solve a first order, homogeneous, time independent ODE on a relatively big system size (3 \times 3 \times 144 \times 144=186,624 dimensions). I had previously made a crude fixed step Dormand Prince (8th order RK) solver to solve it, and it was able to run without memory issues.

I now tried to use DifferentialEquations.jl to do it. However, it was using too much memory, and got killed by the system (using Slurm on a cluster: slurmstepd: error: Detected 1 oom-kill event(s) in StepId=30632558.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. )

Are there any ways to reduce the memory usage of DifferentialEquations.jl ? I have already used solver options to only keep data at specific times (as many as I was storing before):

# define the problem
tmax = Float64(dt * (nt - 1))
f = makef(H)
prob = ODEProblem(f, v, (0, tmax))

# solve it
# save every second
sol = solve(prob, saveat=dt,
            progress=true,
            reltol=1e-5)

See full function at semiclassical-monte-carlo/montecarlo.jl at 466cece1bfcf08da8bd9923d8c513f89cde337eb · Joh11/semiclassical-monte-carlo · GitHub

From what I’ve read in the docs, the saveat settings automatically sets dense to false, so it should not use more memory than my manual methods. Are there any other settings I am not aware of to reduce memory usage ?

ChrisRackauckas · April 15, 2021, 4:35pm

Try DP8(). This is defaulting to a method with automatic stiffness detection between a high order RK method and TRBDF2. Methods for stiff equations require building the Jacobian, which I presume is huge and since no sparsity pattern is specified it’ll be dense. Long story short, that dense Jacobian is probably what’s running out of memory, so choosing a method for non-stiff equations or supplying a sparsity pattern is the way to go.

Joh11 · April 15, 2021, 5:00pm

Thanks ! I’m trying that and will mark the topic solved if it works.
I did not think the stiffness would require the Jacobian.

So it is equivalent to setting alg_hints to :nonstiff right ?

ChrisRackauckas · April 15, 2021, 5:13pm

Not quite equivalent. If you set that then it’ll pick a non-stiff method, like Vern7() IIRC in this case. So it might be a different method but it’ll do “the same thing”.

Joh11 · April 15, 2021, 5:28pm

Ok ! That’s what I meant.

Unfortunately it still runs out of memory.

The solver stores the solution as a Vector{T}, where T is the type of the initial condition vector. However, to integrate with the rest of my code, I would like to convert the solution to a rank 4 Vector, with the last axis being the time index. I thus have the following code to copy the solution to such a vector (https://github.com/Joh11/semiclassical-monte-carlo/blob/466cece1bfcf08da8bd9923d8c513f89cde337eb/src/montecarlo.jl#L86):

# put it in a format usable by the rest of the code
    nt = length(sol)
    L = size(v)[2]
    ret = zeros(Vec3, H.Ns, L, L, nt)

    for n in eachindex(sol.u)
        @views ret[:, :, :, n] = sol.u[n]
    end
    
    ret

This obviously doubles the memory requirements of my code, and I suspect this is now what’s exhausting my memory. Is there a way to convert between these two data types without allocating a new vector ?

Else I will probably have to rework the rest of the code to use this, but I’d ideally keep it untouched.

ChrisRackauckas · April 15, 2021, 5:35pm

The sol object already acts like that: sol[i,j,k,n] if the initial condition is a 3-tensor.

Joh11 · April 16, 2021, 7:54am

Thanks ! Julia indexing is even more flexible than I was thinking… I have to look more into it.

However, with all of it, my program still runs out of memory…
Unless you know other tricks to reduce memory usage, I’ll just stick to my own method then, I do not see any other trivial waste of memory in my code.

ChrisRackauckas · April 16, 2021, 8:45am

I don’t get it. So all you’re doing is solve(prob,DP8(),saveat = dt) at the same exact dt as your other form and you’re saying it’s going OOM while exactly the same algorithm doesn’t go OOM? That makes absolutely no sense. I’d like to look into an MWE if that’s the case.

Joh11 · April 16, 2021, 9:12am

Indeed, that’s really strange.

Just to be clear, DP8 is exactly the RK method described by wikipedia ?

There is no chance my crude code does something more clever than the fantastic work of you and others at DifferentialEquations.jl

Let me first make sure everything is the same. Could the adaptative timestep mechanism be the thing consuming more memory ? Mine is fixed and predefined. (I’m running it right now with a fixed dt to check)

Also to make a MWE, is there a better way to check memory usage than triggering an OOM kill ? I am aware of the @allocated macro, but this does not give the peak memory usage, which is the bottleneck here.

ChrisRackauckas · April 16, 2021, 9:30am

No, you linked to DP5(), a 5th order method. DP8 is the 8-5-3 method of dop853 (the tableau isn’t really accessible anywhere… the dop853 code is the best reference on it). I presume that your 8th order method instead matches ExplicitRK(constructDormandPrince8_64bit()) exactly: it’s what you’d get if you search for an 8th order Dormand-Prince tableau, but it’s surprisingly not the one actually used in practice in the dop853 code.

github.com

SciML/DiffEqDevTools.jl/blob/master/src/ode_tableaus.jl#L7932-L7961


      
          A[17, 12] = convert(T,
                              big"-2.552203480132132516997563217309689292804518121743365818482497611667126218719069737195")
          A[17, 13] = convert(T,
                              big".3301343553488974584509065658432587594272577682203521183331319470982274757841566234948")
          A[17, 14] = convert(T,
                              big"-.9161854401769482236671414092387917470686251714192236693920061138984202381209109248553e-1")
          A[17, 15] = convert(T,
                              big"-1.525735678746850818217653470352135651821164556169070505816135230784807058389577753184")
          A[17, 16] = convert(T,
                              big".7371445601564892133467497107205798584829803038168267854389817508169123996459113657504")
          A[18, 1] = convert(T,
                             big".1017366974111576638766809656369828971944080018220332809259398740674738807023371082700")
          A[18, 2] = convert(T, big"0.")
          A[18, 3] = convert(T, big"0.")
          A[18, 4] = convert(T, big"0.")
          A[18, 5] = convert(T, big"0.")
          A[18, 6] = convert(T, big"0.")
          A[18, 7] = convert(T, big"0.")
          A[18, 8] = convert(T,
                             big"-1.696217553209432810711666838709742166182992092906177246174096517233561845662947862824")

This file has been truncated. show original

However, that all shouldn’t matter because the peak memory should be dominated by the saving procedure if you’re using a small enough dt.

github.com

SciML/OrdinaryDiffEq.jl/blob/master/src/caches/high_order_rk_caches.jl#L66-L87


      
          k6::rateType
          k7::rateType
          k8::rateType
          k9::rateType
          k10::rateType
          k11::rateType
          k12::rateType
          k13::rateType
          k14::rateType
          k15::rateType
          k16::rateType
          kupdate::rateType
          udiff::rateType
          bspl::rateType
          dense_tmp3::rateType
          dense_tmp4::rateType
          dense_tmp5::rateType
          dense_tmp6::rateType
          dense_tmp7::rateType
          utilde::uType

This file has been truncated. show original

DP8 requires 29 cache vectors and so if you’re saving like 100 steps then the saving behavior is going to dominate the memory usage. (That could be chopped down with checks to calck for when dense output is unused, but only by a little).

So there’s two possible things here. One is that dt is relatively large and the integrator memory is dominating. If that’s the case, then DP8 is just the wrong choice here, and Vern7 or Vern8 is just better while being less memory intensive.

github.com

SciML/OrdinaryDiffEq.jl/blob/master/src/caches/verner_caches.jl#L79-L114


      
              rtmp::rateType
              atmp::uNoUnitsType
              stage_limiter!::StageLimiter
              step_limiter!::StepLimiter
              thread::Thread
          end
          
          TruncatedStacktraces.@truncate_stacktrace Vern7Cache 1
          
          function alg_cache(alg::Vern7, u, rate_prototype, ::Type{uEltypeNoUnits},
              ::Type{uBottomEltypeNoUnits}, ::Type{tTypeNoUnits}, uprev, uprev2, f, t,
              dt, reltol, p, calck,
              ::Val{true}) where {uEltypeNoUnits, uBottomEltypeNoUnits, tTypeNoUnits}
              k1 = zero(rate_prototype)
              k2 = zero(rate_prototype)
              k3 = k2
              k4 = zero(rate_prototype)
              k5 = zero(rate_prototype)
              k6 = zero(rate_prototype)
              k7 = zero(rate_prototype)

This file has been truncated. show original

20 cache vectors for that. The Dormand-Prince 6/8 method only requires 15 cache vectors, but it would be more efficient to just use Vern7 for that same memory usage. You can check this by running Tsit5() or DP5(), which will match the Wikipedia link in memory and so if that’s what you actually implemented then it should be fine.

The other possibility is that dt is small and thus the solution itself is dominating memory, in which case I’d want to see if your current solve call is chopping everything out.

Joh11 · April 16, 2021, 10:23am

Ok that’s totally my bad then…

I did not really think about it, I thought it was 8th order because it required 8 intermediate steps. That’s why it works in my code then: I’m implementing DP5, so it requires less cache vectors than DP8.

I just tried with DP5() instead of DP8(), and it works without OOM now !

Given what the docs say, I’ll eventually change to Tsit5().

Thanks again for your precious help !

ChrisRackauckas · April 16, 2021, 10:53am

It’s 7 intermediate steps, cut down to 6 using the FSAL property. But that’s a minor detail . The bigger point is, it’s actually impossible to have an 8th order method in 8 steps. 4th order in 4 steps is the highest you can go with that property. That’s a classic result described here:

In general, if an explicit s-stage Runge–Kutta method has order p, then it can be proven that the number of stages must satisfy s >= p, and if p >= 5, then s >= p+1. However, it is not known whether these bounds are sharp in all cases; for example, all known methods of order 8 have at least 11 stages, though it is possible that there are methods with fewer stages.

So what you were looking for there is provably impossible. Most higher order methods take more steps than the minimum number in order to increase stability and reduce the leading truncation error coefficients. Sounds less efficient, but if you dig in that greatly improves the efficiency by allowing much larger dts for the same error, and thus decreases the total steps required. But this is getting into the weeds now.

Topic		Replies	Views
Solving a large system of differential equations Numerics question , diffeq	10	1998	September 29, 2021
Using DifferentialEquations.jl for very large system General Usage diffeq	7	4246	December 19, 2016
Reducing RAM usage when solving large set of differential equations Performance question	5	147	February 6, 2025
DifferentialEquations -- what goes wrong? General Usage	14	2407	November 13, 2019
Very high memory use with DelayDifferentialEquations Modelling & Simulations	12	305	July 21, 2025

DifferentialEquations.jl: running out of memory with large system size ODE

Related topics