Parallel simulation : coroutine vs multi-threading

Hi, community!

I’m usually using DiffEq.jl to perform dynamical system simulation. To perform simulation with multiple conditions simultaneously (to reduce the overall wall time), I often use multi-threading via Transducers.tcollect as it is convenient.

However, I found that it often causes OOM (out of memory) error. So I searched some parallel and asyncronous computing, and finally there are some candidates for my desire: coroutine and multi-threading.

  1. Which one is better for this case? Note that each simulation takes about < 10~100s.
  2. If coroutine is a good choice, is there any convenient way to use it as tcollect?

If you’re hitting an OOM, more parallelism won’t help it, it would do the exact opposite. Instead ask the question, why are you going OOM? Are you saving more than you need to? Are you parallelizing a ton of simulations at the same time where if you ran say 4 at the same time your computer would have enough memory? Is it simply something that requires a lot of memory so you need to go distributed on a cluster?

5 Likes

Yes, I agree with Chris that you’d need to think about limiting parallelism if you want to solve OOM.

There are also probably some ways to reduce memory usage, depending on exactly what you are doing. (So MWE or some sketch of your program would be useful.)

As a toy example, if your program computes the trajectories of all the instances of the dynamical systems and then dump them to files like this

solutions = tcollect(solve(prob) for prob in systems)
for sol in solutions
    save_to_file(sol)
end

it would be much more memory efficient if you just dump the solution to the file right away (so that unused sol can be GC’ed away):

Threads.@threads for prob in systems
    sol = solve(prob)
    save_to_file(sol)
end

As another example, if you have some kind of analysis pipeline after solving the system:

solutions = tcollect(solve(prob) for prob in systems)
statistics = tcollect(compute_some_statistics(sol) for sol in solutions)

it’d be much better (both in terms of the run-time and memory usage) to do the analysis right away:

solutions = tcollect(compute_some_statistics(solve(prob)) for prob in systems)

If you implemented tricks like the above already and still need to optimize your program for memory efficiency, you might want to look at some examples in Concurrency patterns for controlled parallelisms to limit the memory usage.

5 Likes

Thanks you all.

@tkf, your examples are so straightforward and helpful.
Simple changes like your suggestion seems to solve most errors in my case (especially NOT compute all trajectories then dump).

Thank you :slight_smile:

One more question: your example uses Threads.@threads. Instead, can I use tcollect in that example?

Yes, you can use tcollect but it’s not super useful when there’s no output. If there are no outputs, you can use simpler APIs like Threads.@threads and Folds.foreach (the latter may be nice for load-balancing).

On the other hand, for example, tcollect could be useful if you want a list of file paths:

paths = systems |>
    Enumerate() |>
    Map() do (i, prob)
        sol = solve(prob)
        if isgood(sol)
            path = "data-$i.???"
            save(path, sol)
            return path
        else
            return nothing
        end
    end |>
    Filter(!isnothing) |>
    tcollect
1 Like