How to debug memory issues (when using DifferentialEquations)

I’m trying to understand an apparent memory issue regarding a simulation I’m doing.

Here’s a simplified version of what I’m running:

using DifferentialEquations, Plots


function usode!(du,u,p,t)
    C1 = 1.63e-9
    RS = 790.0
    C2 = 3.3e-9
    C0 = 0.3e-9
    L1 = 1e-3
    L2 = 82e-3
    CM = 4e-12
    LM = 8.0
    RM = 7500.0
    VC1, VC2, VC0, VCM, IL1, IL2, ILM, Vp = u
    fv, VAmag = p
    VA = VAmag*sinpi(2*Vp)
    du[1] = 1/C1*IL1
    du[2] = -1/C2*IL1
    du[3] = 1/C0*(IL1-IL2-ILM)
    du[4] = 1/CM*ILM
    du[5] = 1/L1*(VA-VC1-VC0-VC2-IL1*RS)
    du[6] = 1/L2*VC0
    du[7] = 1/LM*(VC0-VCM-ILM*RM)
    du[8] = fv
end

mutable struct Controller
    f::Float64
end
function(c::Controller)(integrator)
    integrator.p[1] = c.f
    if c.f < 30000.0
        c.f += 15.0
    end
    println(integrator.t)
end

function sim()
    p = [0.0, 100.0]
    cb1 = PeriodicCallback(Controller(27000.0),0.005)
    cbs = CallbackSet(cb1,)
    u0 = [0.0,0.0,0.0,0.0, 0.0,0.0,0.0, 0.0]
    tspan = [0.0, 1.1]
    prob = ODEProblem(usode!,u0,tspan,p)
    @time sol = solve(prob,Tsit5(), callback=cbs, reltol=1e-8, abstol=1e-8, maxiters=10_000_000)
    sol
end

sol = sim()

Execution takes about 30 seconds on this computer (4/8 cores/threads, 16 MB RAM, Windows 10). Watching the Memory view in Task Manager along with print statements from the code I can see pauses which I presume are garbage collector runs. Seems to be doing what I want and happens to use nearly all my RAM.

However, if I want to run the simulation again after making a small change to the constants in function usode!, and executing the last line again (using Atom/Juno and Ctrl-Enter) the next run is slow at first, then painfully slow, then essentially grinds to a halt, but if you wait long enough and maybe close everything else on the PC, then have a drink, eventually it finishes (1432 seconds).

What I expected was that when I executed sol=sim() that the giant solution structure pointed to by sol would be released to be garbage collected, but it seems like that didn’t happen.

So, I killed the Julia session, started again, and this time, did a sol = nothing after the first run to try to more explicitly let the system reclaim the memory. Watching in Task Manager, it’s clear that this only partly works, and now I’m doing the dishes while waiting to see the execution time. It is also just smashing on the SDD now. CPU just sputtering along, waiting on everyone else. 1108 seconds!

So what is keeping all that memory in use and what can I do to tell Julia (and maybe some part of Atom/Juno) to let go of it?

To me that sounds like it started swapping. When a GC langauge starts swapping things go bad real quick. I think your sol = nothing is on the right track. What I would suggest you try is replace the last line with:

sol = nothing
GC.gc()
sol = sim()

That will make sure sol isn’t keeping anything around, force the garbage collector to run, then start calculating again.

Hi @pixel27, I’m trying that, but the GC.gc() did not release all the memory, something is still hanging on to it.

Well that’s disturbing. This is a long shot but it might give you an idea. If your julia file is test.jl start julia like:

julia --track-allocation=all test.jl

That will create a file with ALL memory allocations. Then modify the file to replace your sim() call with:

sol = sim()
sol = nothing
sol = sim()

And again run julia with:

julia --track-allocation=all test.jl

If you compare the two files created all your allocation should double but you would want to look for allocations that doubled but shouldn’t have. The trouble will be figuring out what shouldn’t have increased by just running it twice.

Or wait for someone with a better suggestion to respond. :slight_smile:

1124s. Using GC.gc() clearly didn’t release the memory.

I’ll try the --track-allocation and also try running outside of Atom/Juno.

I think it might be a Juno/Atom thing. Adding sol=nothing and another run of the sim and running from the command line, the memory seems to get released, and I get similar speeds (actually faster the second time because no compilation).

Now I’m not so sure. Trying to make a MWE for the Juno group, and I’m getting the same behavior from the command line. There’s a variable I’m not controlling. Argh.

:frowning:

I think it is maybe an Atom/Juno thing. I’m going to make a new post about it.

I have experienced something similar in Jupyter notebooks although I haven’t really been able to reproduce the behavior in a small example.

Hi Chris,

Looking at the sol object, there’s the .t Vector{Float64} and .u Vector{Array{Float64,1}} as I expected, but what is the .k Vector{Array{Array{Float64,1},1}} that I’m seeing? I was trying to think about how much storage the solution object should require, and for about 7M points, 8 state vars and time, I think I should need on the order of 500 MB, but based on the Task Manager indications it’s consuming about 20x that.

Thanks.

.k is for the interpolations, and indeed it’s the most memory heavy portion. Set dense=false, or use saveat, to avoid this. It should be documented as the most memory intensive part in https://docs.sciml.ai/latest/basics/common_solver_opts/, but if that needs more clarity please let me know.

Thank you! That makes a huge difference in memory usage and in speed. When I have a bit more time I will look through the docs and see if I have suggestions.

If one needs only the final solution and no intermediate values, setting save_everystep=false should also help, shouldn’t it? I’ve experienced some memory issues even with that option (and using a callback to save ca. 1000 scalar intermediate values).

Yeah, if that’s all you need then turn off all intermediate saves. If that’s done, then DiffEq isn’t saving anything, so I’d be surprised to hear of a memory issue.