DifferentialEquations.jl: running out of memory with large system size ODE

Hi everyone,

I need to solve a first order, homogeneous, time independent ODE on a relatively big system size (3 \times 3 \times 144 \times 144=186,624 dimensions). I had previously made a crude fixed step Dormand Prince (8th order RK) solver to solve it, and it was able to run without memory issues.

I now tried to use DifferentialEquations.jl to do it. However, it was using too much memory, and got killed by the system (using Slurm on a cluster: slurmstepd: error: Detected 1 oom-kill event(s) in StepId=30632558.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. )

Are there any ways to reduce the memory usage of DifferentialEquations.jl ? I have already used solver options to only keep data at specific times (as many as I was storing before):

# define the problem
tmax = Float64(dt * (nt - 1))
f = makef(H)
prob = ODEProblem(f, v, (0, tmax))

# solve it
# save every second
sol = solve(prob, saveat=dt,
            progress=true,
            reltol=1e-5)

See full function at semiclassical-monte-carlo/montecarlo.jl at 466cece1bfcf08da8bd9923d8c513f89cde337eb · Joh11/semiclassical-monte-carlo · GitHub

From what I’ve read in the docs, the saveat settings automatically sets dense to false, so it should not use more memory than my manual methods. Are there any other settings I am not aware of to reduce memory usage ?

Try DP8(). This is defaulting to a method with automatic stiffness detection between a high order RK method and TRBDF2. Methods for stiff equations require building the Jacobian, which I presume is huge and since no sparsity pattern is specified it’ll be dense. Long story short, that dense Jacobian is probably what’s running out of memory, so choosing a method for non-stiff equations or supplying a sparsity pattern is the way to go.

4 Likes

Thanks ! I’m trying that and will mark the topic solved if it works.
I did not think the stiffness would require the Jacobian.

So it is equivalent to setting alg_hints to :nonstiff right ?

Not quite equivalent. If you set that then it’ll pick a non-stiff method, like Vern7() IIRC in this case. So it might be a different method but it’ll do “the same thing”.

Ok ! That’s what I meant.

Unfortunately it still runs out of memory.

The solver stores the solution as a Vector{T}, where T is the type of the initial condition vector. However, to integrate with the rest of my code, I would like to convert the solution to a rank 4 Vector, with the last axis being the time index. I thus have the following code to copy the solution to such a vector (semiclassical-monte-carlo/montecarlo.jl at 466cece1bfcf08da8bd9923d8c513f89cde337eb · Joh11/semiclassical-monte-carlo · GitHub):

# put it in a format usable by the rest of the code
    nt = length(sol)
    L = size(v)[2]
    ret = zeros(Vec3, H.Ns, L, L, nt)

    for n in eachindex(sol.u)
        @views ret[:, :, :, n] = sol.u[n]
    end
    
    ret

This obviously doubles the memory requirements of my code, and I suspect this is now what’s exhausting my memory. Is there a way to convert between these two data types without allocating a new vector ?

Else I will probably have to rework the rest of the code to use this, but I’d ideally keep it untouched.

The sol object already acts like that: sol[i,j,k,n] if the initial condition is a 3-tensor.

1 Like

Thanks ! Julia indexing is even more flexible than I was thinking… I have to look more into it.

However, with all of it, my program still runs out of memory…
Unless you know other tricks to reduce memory usage, I’ll just stick to my own method then, I do not see any other trivial waste of memory in my code.

I don’t get it. So all you’re doing is solve(prob,DP8(),saveat = dt) at the same exact dt as your other form and you’re saying it’s going OOM while exactly the same algorithm doesn’t go OOM? That makes absolutely no sense. I’d like to look into an MWE if that’s the case.

Indeed, that’s really strange.

Just to be clear, DP8 is exactly the RK method described by wikipedia ?

There is no chance my crude code does something more clever than the fantastic work of you and others at DifferentialEquations.jl :slight_smile:

Let me first make sure everything is the same. Could the adaptative timestep mechanism be the thing consuming more memory ? Mine is fixed and predefined. (I’m running it right now with a fixed dt to check)

Also to make a MWE, is there a better way to check memory usage than triggering an OOM kill ? I am aware of the @allocated macro, but this does not give the peak memory usage, which is the bottleneck here.

No, you linked to DP5(), a 5th order method. DP8 is the 8-5-3 method of dop853 (the tableau isn’t really accessible anywhere… the dop853 code is the best reference on it). I presume that your 8th order method instead matches ExplicitRK(constructDormandPrince8_64bit()) exactly: it’s what you’d get if you search for an 8th order Dormand-Prince tableau, but it’s surprisingly not the one actually used in practice in the dop853 code.

However, that all shouldn’t matter because the peak memory should be dominated by the saving procedure if you’re using a small enough dt.

DP8 requires 29 cache vectors and so if you’re saving like 100 steps then the saving behavior is going to dominate the memory usage. (That could be chopped down with checks to calck for when dense output is unused, but only by a little).

So there’s two possible things here. One is that dt is relatively large and the integrator memory is dominating. If that’s the case, then DP8 is just the wrong choice here, and Vern7 or Vern8 is just better while being less memory intensive.

20 cache vectors for that. The Dormand-Prince 6/8 method only requires 15 cache vectors, but it would be more efficient to just use Vern7 for that same memory usage. You can check this by running Tsit5() or DP5(), which will match the Wikipedia link in memory and so if that’s what you actually implemented then it should be fine.

The other possibility is that dt is small and thus the solution itself is dominating memory, in which case I’d want to see if your current solve call is chopping everything out.

Ok that’s totally my bad then…

I did not really think about it, I thought it was 8th order because it required 8 intermediate steps. That’s why it works in my code then: I’m implementing DP5, so it requires less cache vectors than DP8.

I just tried with DP5() instead of DP8(), and it works without OOM now !

Given what the docs say, I’ll eventually change to Tsit5().

Thanks again for your precious help !

It’s 7 intermediate steps, cut down to 6 using the FSAL property. But that’s a minor detail :wink:. The bigger point is, it’s actually impossible to have an 8th order method in 8 steps. 4th order in 4 steps is the highest you can go with that property. That’s a classic result described here:

In general, if an explicit s-stage Runge–Kutta method has order p, then it can be proven that the number of stages must satisfy s >= p, and if p >= 5, then s >= p+1. However, it is not known whether these bounds are sharp in all cases; for example, all known methods of order 8 have at least 11 stages, though it is possible that there are methods with fewer stages.

So what you were looking for there is provably impossible. Most higher order methods take more steps than the minimum number in order to increase stability and reduce the leading truncation error coefficients. Sounds less efficient, but if you dig in that greatly improves the efficiency by allowing much larger dts for the same error, and thus decreases the total steps required. But this is getting into the weeds now.

2 Likes