Threading, Threads.@threads vs polyester.@batch vs LoopVectorization.@tturbo

fft · July 21, 2022, 3:53am

I’m trying to understand the multithreading options in julia. Seems that there are at least three options Threads.@threads, polyester.@batch, and LoopVectorization.@tturbo. Reading it sounds like @batch can be faster than Threads.@threads. Can anyone give a quick summary of the pros and cons of these three methods and when each of the options might be preferred?

greatpet · July 22, 2022, 10:40am

Bump. Why does Polyester.@batch have lower overhead than Threads.@threads? Does this lower overhead make it less flexible? And how does it compare with OpenMP applied to for-loops in C/C++/Fortran?

P.S. I found some information about the difference between Polyester and LoopVectorization. Otherwise, does Polyester aim to be an eventual replacement for Threads.@threads?

Krastanov · July 22, 2022, 2:49pm

@Elrod is the mastermind behind these tools, so he can better answer how they evolved, but here is a quick description of how I use them.

@tturbo is threading+SIMD instructions (CPU instructions that act simultaneously on 4 or 8 neighboring array elements). It is just a threaded version of @turbo and it uses Polyester for the threading. It is meant for parallelizing simple ~~inner~~ loops, which typically are operations where a single execution of the bare loop takes no more than a couple hundred nanoseconds. You would probably never use @tturbo on something that is not an array of isbits objects.

Polyester.@batch hijacks the threads provided by Julia and uses a much faster, but simpler scheduler. It simply does not provide as many ways to nest threads as @threads. Because of its simplicity it has drastically lower overhead, so it is useful for multi-threading things that are already very fast (while setting up the @threads scheduling might be slower than your fast operation). Usually you should use @batch only if your threaded jobs might often be small.

You can nest @batch inside of @thread but the scheduling of the threads might get very confused. I usually just disable the Polyester threads when I do such nesting. I think the documentation (and accompanying benchmarks) of this thread-disabling feature (implemented 2 days ago) would be of interest to you: https://github.com/JuliaSIMD/Polyester.jl#disabling-polyester-threads (at the bottom of the README)

Elrod · July 22, 2022, 3:37pm

@tturbo is best applied to the outermost loop where it is valid.
It may parallelize any of those loops.

In this way, it is different from @simd.

Othwise, @Krastanov’s summary is good.

When it is valid, @tturbo will probably generally be the fastest.
@tturbo also handles reductions, e.g.

function mysum(x)
    s = 0.0
    @tturbo for i = eachindex(x)
        s += x[i]
    end
    s
end

which will either not work or lead to incorrect answers if you use @batch or Threads.@threads instead.
In general, you should be able to remove or add @tturbo from a loop without it changing the behavior.
@tturbo does the most, so it is the most vulnerable to bugs, making this helpful: if your answer changes when you add or remove @tturbo, it is @tturbo’s fault rather than your own.

fft · July 23, 2022, 2:28pm

Thank you for the explanation. It sounds like @tturbo or @batch are the preferred way to parallelize simple loops, where @threads give more flexibility for complex loops.

Topic		Replies	Views
Multithreading in LoopVectorization.jl General Usage	8	853	June 30, 2021
Nesting `Threads.@thread` and `Polyester.@batch` (or context manager to limit Polyester threads) Specific Domains multithreading , polyester	7	471	April 5, 2023
Can't understand what LoopVectorization is doing General Usage	7	772	September 1, 2021
Are there any plans to improve the composability of @batch and/or reduce overhead of @threads? Performance question , multithreading , memory-allocation	2	128	November 5, 2024
Threads (including Polyester) question Performance	1	460	November 14, 2022

Threading, Threads.@threads vs polyester.@batch vs LoopVectorization.@tturbo

Related topics