I am running a stochastic model using JumpProcesses.jl. The lowest consistent memory usage I can achieve is between 8-16GB with simulations that take 10-20seconds. I would assume for such an time efficient simulation the memory usage should not be so high.
Some background:
The model is linked to an ODEProblem to control a very simple ODE using a discrete callback, 5 mass action jumps (wrapped into 1 MassActionJump), 2 ConstantRateJumps and 6 unbounded VariableRateJumps. There are 9 species total. I run the model for 6 different conditions where 1 parameter is changed for each condition. For all these conditions I have tried different aggregators (Direct(), DirectFW(), DirectCR(), SortingDirect(), RDirect(), NRM()) and each performs slightly differently for each of the conditions tested, but the most efficient one I can get working is SortingDirect where the RAM used is ~15GB for each simulation with times between ~10-20seconds.
I have tried many other memory enhancing techniques:
save_positions=(false,false) in JumpProblem
safetycopy=false
SVectors for initial conditions and parameters (although these only seem to work with the Direct() aggregator?)
different solvers (I use Tsit5())
different values for saveat (doesn’t make a huge difference at all)
From all these I can only assume that the memory used is so huge because of the sheer number of reactions occurring and molecules that are being tracked (~10,000 for 1 or 2 species depending on condition and within the 100s or 10s for the others lets say). Running this model for 1 instance is not a problem, however when using EnsembleDistributed() and any more than ~500+ trajectories this crashes the server I am using which has 125GB of RAM.
Any ideas how I could potentially reduce the RAM usage with these simulations?
I have created something that uses a large amount of RAM ~7GB (see MWE below). I think it is using so much memory for the same reasons my problem is (just a lot of stochastic reactions occurring). For instance the shorter the timespan, the less memory used for both problems by a significant amount. I’m wondering if there’s some specific code optimisation to be used with this problem or if there’s no way round this in a fully stochastic model due to the large number of jumps occurring.
using DifferentialEquations, JumpProcesses, BenchmarkTools
function params()
return (
k_fast_production = 1000.0,
k_slow_production = 50.0,
k_fast_decay = 50.0,
k_slow_decay = 0.5,
n_species = 10
)
end
function control_ode!(du, u, p, t)
du[1] = -0.01 * u[1]
end
initial_u = [100.0]
initial_jump_species = zeros(params().n_species)
tspan = (0.0, 1000.0)
# Set up jump processes with varied rates and high memory demands
jumps = []
for i in 1:params().n_species
if i == 1
# High-expression species
fast_production_rate(u, p, t) = p.k_fast_production
fast_decay_rate(u, p, t) = p.k_fast_decay
affect_fast_production!(integrator) = (integrator.u[i] += 1; nothing)
affect_fast_decay!(integrator) = (integrator.u[i] -= 1; nothing)
push!(jumps, ConstantRateJump(fast_production_rate, affect_fast_production!))
push!(jumps, VariableRateJump(fast_decay_rate, affect_fast_decay!))
else
# Low-expression species with varied production and decay rates
slow_production_rate(u, p, t) = p.k_slow_production * i
slow_decay_rate(u, p, t) = p.k_slow_decay * i
affect_slow_production!(integrator) = (integrator.u[i] += 1; nothing)
affect_slow_decay!(integrator) = (integrator.u[i] -= 1; nothing)
push!(jumps, ConstantRateJump(slow_production_rate, affect_slow_production!))
push!(jumps, VariableRateJump(slow_decay_rate, affect_slow_decay!))
end
end
control_prob = ODEProblem(control_ode!, initial_u, tspan, params())
jumpset = JumpSet(jumps...)
jump_prob = JumpProblem(control_prob, Direct(), jumpset, save_positions=(false,false))
@btime solve(jump_prob, Tsit5(), maxiters=1e10)
Building an array of 4119719 things takes a lot of memory! You have to use saveat if you’re going to limit that. By default it will just save at each step, but you likely don’t need all of the per-step information in this case.
So it’s simply a function of how much you’re choosing to save.
Note that we will also have some improvements likely come within the next year to the VariableRateJumps, which is the bigger factor in your performance here.
Ah this is interesting, I am using saveat in my original code but I do not see a performance difference with memory like is seen in this example. I see a decrease in array size but no difference in memory allocations which seems odd.
Just tried this and still see no difference. I can run and get an array of length 40 and it still takes the same GB of data as if the saved array was 4 million.
Also, on the example here I’ve confirmed with these settings the memory use is the same if tspan = (0.0, 10.0) vs (0.0, 1000.0). So it really does seem like it might be something else in your actual example.
Is Tsit5 the solver you are using? Are you sure you aren’t allocating accidentally in one of your rate or affect functions for your actual code?
Are you sure the issues you are seeing aren’t actually arising from the EnsembleProblems. i.e. do you see memory issues if you just call solve(prob,...) many times in a loop?
EnsembleProblems unfortunately do not interact well with JumpProblems currently, and can call deepcopy(prob,...) on every solve depending on how they are being used. For this reason I just call solve in a loop myself, and if I need to split tasks among cores handle that myself too.
Yes, in my benchmarking tests I’m calling solve(prob) outside of an ensemble problem. I agree the current MWE doesn’t seem to be completely reproducing my problems so I’ll see if I can get something working that does.