Parallel simulation : coroutine vs multi-threading

iHany · December 22, 2021, 1:54am

Hi, community!

I’m usually using DiffEq.jl to perform dynamical system simulation. To perform simulation with multiple conditions simultaneously (to reduce the overall wall time), I often use multi-threading via Transducers.tcollect as it is convenient.

However, I found that it often causes OOM (out of memory) error. So I searched some parallel and asyncronous computing, and finally there are some candidates for my desire: coroutine and multi-threading.

Which one is better for this case? Note that each simulation takes about < 10~100s.
If coroutine is a good choice, is there any convenient way to use it as tcollect?

ChrisRackauckas · December 22, 2021, 3:19am

If you’re hitting an OOM, more parallelism won’t help it, it would do the exact opposite. Instead ask the question, why are you going OOM? Are you saving more than you need to? Are you parallelizing a ton of simulations at the same time where if you ran say 4 at the same time your computer would have enough memory? Is it simply something that requires a lot of memory so you need to go distributed on a cluster?

tkf · December 22, 2021, 6:01am

Yes, I agree with Chris that you’d need to think about limiting parallelism if you want to solve OOM.

There are also probably some ways to reduce memory usage, depending on exactly what you are doing. (So MWE or some sketch of your program would be useful.)

As a toy example, if your program computes the trajectories of all the instances of the dynamical systems and then dump them to files like this

solutions = tcollect(solve(prob) for prob in systems)
for sol in solutions
    save_to_file(sol)
end

it would be much more memory efficient if you just dump the solution to the file right away (so that unused sol can be GC’ed away):

Threads.@threads for prob in systems
    sol = solve(prob)
    save_to_file(sol)
end

As another example, if you have some kind of analysis pipeline after solving the system:

solutions = tcollect(solve(prob) for prob in systems)
statistics = tcollect(compute_some_statistics(sol) for sol in solutions)

it’d be much better (both in terms of the run-time and memory usage) to do the analysis right away:

solutions = tcollect(compute_some_statistics(solve(prob)) for prob in systems)

If you implemented tricks like the above already and still need to optimize your program for memory efficiency, you might want to look at some examples in Concurrency patterns for controlled parallelisms to limit the memory usage.

iHany · December 22, 2021, 2:42pm

Thanks you all.

@tkf, your examples are so straightforward and helpful.
Simple changes like your suggestion seems to solve most errors in my case (especially NOT compute all trajectories then dump).

Thank you

One more question: your example uses Threads.@threads. Instead, can I use tcollect in that example?

tkf · December 22, 2021, 10:00pm

Yes, you can use tcollect but it’s not super useful when there’s no output. If there are no outputs, you can use simpler APIs like Threads.@threads and Folds.foreach (the latter may be nice for load-balancing).

On the other hand, for example, tcollect could be useful if you want a list of file paths:

paths = systems |>
    Enumerate() |>
    Map() do (i, prob)
        sol = solve(prob)
        if isgood(sol)
            path = "data-$i.???"
            save(path, sol)
            return path
        else
            return nothing
        end
    end |>
    Filter(!isnothing) |>
    tcollect

Topic		Replies	Views
Don't understand why code runs out of memory and crashes General Usage memory , crash	29	611	December 27, 2024
Multithreaded memory usage General Usage	19	1681	July 14, 2023
Writing effective parallel code Performance parallel	8	1707	December 18, 2019
Julia multithreading is running slower than serial, can someone please explain why...? Performance multithreading , floops	15	1402	March 3, 2023
Poor performance of garbage collection in multi-threaded application Julia at Scale garbage-collection	22	5436	February 3, 2022

Parallel simulation : coroutine vs multi-threading

Related topics