Gcc vs Threads.@threads vs Threads.@spawn for large loops

kpamnany · February 7, 2020, 3:51pm

A statically scheduled loop in OpenMP splits the loop range over num_threads and typically uses a tree to fan-out the work to the threads in parallel. At the end of the loop, the inverse of the tree is typically used for a barrier. This broadcast-barrier pair of synchronization constructs are pretty much the entire overhead for the loop and these are very well studied – each takes only hundreds to a few thousand cycles, depending on the processor and the number of threads.

But what happens if in the loop body you call another function that also has a parallel loop? With OpenMP you have to analyze your call graph, determine thread allocation at each level, and then carefully set up thread affinities, and possibly environment variables for libraries, etc. in order to use static scheduling all the way down. It is not impossible, just very very hard. So OpenMP added tasks and teams. And those aren’t pervasive in libraries anyway.

What I’m getting at is that it isn’t just variable duration loop iterations that require dynamic scheduling of the sort Julia’s scheduler manages.

As you say, ‘not everything can be done at once’, or IOW, there’s no magic bullet. Nonetheless, we still hope to improve the common case.

Topic		Replies	Views
Overhead of `Threads.@threads` Performance question , multithreading	30	5359	March 13, 2021
Embarrassingly parallel multi-threading doesn't scale Performance multithreading	17	1613	October 16, 2021
Notes on multithreading with Julia Teaching & Outreach parallel , multithreading	5	1294	June 29, 2020
Distributing loops across threads manually (something like OpenMP) Performance multithreading	14	1323	November 2, 2021
Question for lower performance by using @threads in for loop New to Julia question	13	1054	July 9, 2021

Gcc vs Threads.@threads vs Threads.@spawn for large loops

Related topics