I’m trying to understand the multithreading options in julia. Seems that there are at least three options Threads.@threads, polyester.@batch, and LoopVectorization.@tturbo. Reading it sounds like @batch can be faster than Threads.@threads. Can anyone give a quick summary of the pros and cons of these three methods and when each of the options might be preferred?
Bump. Why does
Polyester.@batch have lower overhead than
Threads.@threads? Does this lower overhead make it less flexible? And how does it compare with OpenMP applied to for-loops in C/C++/Fortran?
P.S. I found some information about the difference between Polyester and LoopVectorization. Otherwise, does Polyester aim to be an eventual replacement for
@Elrod is the mastermind behind these tools, so he can better answer how they evolved, but here is a quick description of how I use them.
@tturbo is threading+SIMD instructions (CPU instructions that act simultaneously on 4 or 8 neighboring array elements). It is just a threaded version of
@turbo and it uses
Polyester for the threading. It is meant for parallelizing simple
inner loops, which typically are operations where a single execution of the bare loop takes no more than a couple hundred nanoseconds. You would probably never use
@tturbo on something that is not an array of
Polyester.@batch hijacks the threads provided by Julia and uses a much faster, but simpler scheduler. It simply does not provide as many ways to nest threads as
@threads. Because of its simplicity it has drastically lower overhead, so it is useful for multi-threading things that are already very fast (while setting up the
@threads scheduling might be slower than your fast operation). Usually you should use
@batch only if your threaded jobs might often be small.
You can nest
@batch inside of
@thread but the scheduling of the threads might get very confused. I usually just disable the
Polyester threads when I do such nesting. I think the documentation (and accompanying benchmarks) of this thread-disabling feature (implemented 2 days ago) would be of interest to you: GitHub - JuliaSIMD/Polyester.jl: The cheapest threads you can find! (at the bottom of the README)
@tturbo is best applied to the outermost loop where it is valid.
It may parallelize any of those loops.
In this way, it is different from
Othwise, @Krastanov’s summary is good.
When it is valid,
@tturbo will probably generally be the fastest.
@tturbo also handles reductions, e.g.
function mysum(x) s = 0.0 @tturbo for i = eachindex(x) s += x[i] end s end
which will either not work or lead to incorrect answers if you use
In general, you should be able to remove or add
@tturbo from a loop without it changing the behavior.
@tturbo does the most, so it is the most vulnerable to bugs, making this helpful: if your answer changes when you add or remove
@tturbo, it is
@tturbo’s fault rather than your own.
Thank you for the explanation. It sounds like @tturbo or @batch are the preferred way to parallelize simple loops, where @threads give more flexibility for complex loops.