Estimate memory requirements of ODE solver

Hi everyone,
I am solving a very large (~10^8) system of ODE’s. I would like to use the VCABM solver from ordinaryDiffEq since that seems to be fastest, but I have memory issues that I would like to understand better.

I am aware that the ODE solvers will allocate their cache dependent on the order.
It looks like VCABM needs quite a few caches about 60, looking at the implementation in OrdinaryDiffEq:

alg_cache
function alg_cache(alg::VCABM, u, rate_prototype, ::Type{uEltypeNoUnits},
    ::Type{uBottomEltypeNoUnits}, ::Type{tTypeNoUnits}, uprev, uprev2, f, t,
    dt, reltol, p, calck,
    ::Val{true}) where {uEltypeNoUnits, uBottomEltypeNoUnits, tTypeNoUnits}
    fsalfirst = zero(rate_prototype)
    k4 = zero(rate_prototype)
    dts = fill(zero(dt), 13)
    c = fill(zero(t), 13, 13)
    g = fill(zero(t), 13)
    ϕ_n = Vector{typeof(rate_prototype)}(undef, 13)
    ϕstar_nm1 = Vector{typeof(rate_prototype)}(undef, 13)
    ϕstar_n = Vector{typeof(rate_prototype)}(undef, 13)
    ϕ_np1 = Vector{typeof(rate_prototype)}(undef, 14)
    for i in 1:13
        ϕ_n[i] = zero(rate_prototype)
        ϕstar_nm1[i] = zero(rate_prototype)
        ϕstar_n[i] = zero(rate_prototype)
    end
    for i in 1:14
        ϕ_np1[i] = zero(rate_prototype)
    end
    β = fill(zero(t), 13)
    order = 1
    max_order = 12
    atmp = similar(u, uEltypeNoUnits)
    recursivefill!(atmp, false)
    tmp = zero(u)
    ξ = zero(dt)
    ξ0 = zero(dt)
    utilde = zero(u)
    utildem2 = zero(u)
    utildem1 = zero(u)
    utildep1 = zero(u)
    atmp = similar(u, uEltypeNoUnits)
    recursivefill!(atmp, false)
    atmpm1 = similar(u, uEltypeNoUnits)
    recursivefill!(atmpm1, false)
    atmpm2 = similar(u, uEltypeNoUnits)
    recursivefill!(atmpm2, false)
    atmpp1 = similar(u, uEltypeNoUnits)
    recursivefill!(atmpp1, false)
    VCABMCache(u, uprev, fsalfirst, k4, ϕstar_nm1, dts, c, g, ϕ_n, ϕ_np1, ϕstar_n, β, order,
        max_order, atmp, tmp, ξ, ξ0, utilde, utildem1, utildem2, utildep1, atmpm1,
        atmpm2, atmpp1, 1)
end

I confirmed this by printing sizeofcache = Base.format_bytes(Base.summarysize(cache)) (gives 64 times the size of my state array).

This is quite a lot, but should still fit in my memory (for 100 caches the size of my input array, I should be only at 56 GB memory).
Nonetheless, I find my calculation to consume 2-3 times that amount (~133 GB).

Can anyone give me a pointer if this is expected? How do I find out why so much memory is allocated? A few more points:

  • I use save_everystep = false. More generally, the memory allocation does not increase after the simulation is started, so it is no memory leak
  • I realize that arrays may allocate more memory than they need, but I believe this should not be an issue here, since I allocate my state with zeros(dims…)

You can look at the allocation profiler.

Someone could take a dive into VCABM though and see if the total number of arrays could be reduced.

1 Like

Thanks, that is a good idea!

I have had a look, though unfortunately it reports the majority of allocated memory to UnknownType which apparently is a known issue.
Perhaps I can still figure out what the problem is. Am I safe In assuming that the ODE solvers will never allocate memory depending on the parameters, I am passing? My params contain a few more buffer arrays, but i think they should not contribute too much unless there are also ~60 copies of them.

Of course it would be awesome if the overall memory requirement can be reduced, especially given that VCABM is great for very big ODE’s.

Yes, they don’t copy it.

For very big ODEs, you should use a low-memory RK method. That’s their speciality.

I assume that this should be the choice only if other standard RK solvers (i.e. Tsit5 or Vern7 ) consistently do not fit in memory but as long as there is memory left isn’t it better to chose a method that is geared towards speed instead of memory?
So in my case, given that I can use most other RK solvers, I don’t gain any benefit from using the low-memory methods. So for me the choice is more between Tsit5 (comfortably fits in memory) and VCABM (faster, but can require too much memory).

Given the fact that VCABM should on paper only allocate 60~80 times the memory of my ODE state, VCABM is likely the best option if I can figure out where the other ~70GB of memory have gone.

As a debug help, i have added the following to a FunctionCallingCallback:

totalmem = printMem((integrator,State,setup,saved_values)) 
@info "Memory usage: " totalmem integrator = printMem(integrator) State = printMem(State) parameters = printMem(parameters) savedvalues = printMem(saved_values) integOverState = Base.summarysize(integrator) / Base.summarysize(State)

which should give me an overview of all the allocated objects.

┌ Info: Memory usage: 
│   totalmem = "4.598 GiB"
│   integrator = "4.598 GiB"
│   State = "65.552 MiB"
│   parameters = "248.153 MiB"
│   savedvalues = "2.312 MiB"
└   integOverState = 71.821170284851291

Nearly all of the memory is taken by the integrator, and it is about what it should take, i.e. ~72x.
So I take that the ODE should fit comfortably in my memory (even if the state is larger, about 600MB for my original problem). Probably my issue is not within OrdinaryDiffEq I will investigate whether its an issue with the GC not being aggressive enough

1 Like

Yeah use the best method that fits into memory. I was just mentioning that if you do have very large ODEs, some methods are designed to fit into less memory, with the trade-off of being a generally less efficient method otherwise.

1 Like

Ok turns out I was being stupid. I forgot that my ODE state is a partition array which has three times the size I was mentioning. So all the memory is precisely accounted for.
Sorry for the bait!

1 Like