The purpose of ThreadingUtilities.jl and Polyester.jl

Gregstrq · April 8, 2022, 2:09pm

As far as I understand, both ThreadingUtilities.jl and Polyster.jl provide the threads with low overhead.

Could someone (@Elrod) elaborate the precise difference between the packages and when one should be preferred over another?

What package should I use if I want the static threads with statically pinned tasks?

goerch · April 8, 2022, 2:35pm

It depends;) We tried to discuss something similar recently resulting in some hints for me but without a final conclusion. OK, I have to admit that ThreadingUtilities.jl is new to me.

Elrod · April 8, 2022, 2:59pm

ThreadingUtilities.jl only provides a low level API, which packages like Polyester.jl, LoopVectorization.jl, and Octavian.jl use to provide more convenient APIs.

If you want one of the conveniently provided APIs, use them. If they don’t work for your use case, you can use ThreadingUtilities.jl directly (or perhaps in conjunction with PolyesterWeave.jl, which Polyester.jl, LoopVectorization.jl, and Octavian.jl also use).

Gregstrq · April 8, 2022, 3:56pm

Cool!

A couple of questions then, @Elrod .

If I pin the Julia threads, will it transfer to ThreadingUtilities.jl threads?
I can imagine running jullia -t nthreads with numactl or taskset.
There is also an approach used in ThreadPinning.jl, which is based on querying the cpu id with sched_getcpu and pinning the thread using uv_thread_setaffinity.
Let’s say I do repeatedly @batch per=thread for i=1:4000 with 4 threads. Will the range 1:1000 be always processed on thread 1, 1001:2000 – on thread 2 and etc.?

carstenbauer · April 8, 2022, 4:45pm

The mentioned packages just manage the available pool of Julia threads (specified via julia -t) and don’t create any new threads. So, if you pin the Julia threads by whatever method you’ve also pinned the “ThreadingUtilities.jl threads” because they are the same threads.

That’s a matter of the scheduling logic and what you describe is what e.g. @threads :static gives you. I believe @batch works in the same way but @Elrod will know better.

Gregstrq · April 8, 2022, 5:19pm

The reason I am asking about the static scheduling is the following. I want to make sure, that the set of cpu cores with the common L3 cash always process the same chunk of data.

So, first I need to pin the threads. (Which I understand how to do, thanks to your package, @carstenbauer)
Then, I need to schedule the tasks in such a manner, that the threads with the common L3 cash work on the same chunk of data.
Finally, I want to repeat the computations and make sure that this assignment of data chunk to thread group stays the same.

Elrod · April 8, 2022, 6:07pm

Yes, it does static scheduling.
If you have 4 threads total, it’ll be

Range	Threads.threadid()
1:1000	2
1001:2000	3
2001:3000	4
3001:4000	1

The first thread tells the others three to do work, and then begins on the last chunk itself.
It goes through PolyesterWeave.jl, so if you have many nested threaded programs that use only a few each (e…g. maybe you have 16 threads total but each place in your code only uses 4 threads), then of course it’s harder to predict where any particular set will land, but in general it’ll be patterns like this of all the leading threads first, and then the thread that actually ran the @batch code running the rest.

So if you had a 5950X, which has an L3 cache for cores 0-7, and a second for cores 8-15, the first and last 7 groups would run on the last L3, while the second through ninth would run on the first L3.
If this is too inconvenient, I’d accept a PR changing this.
But it’d ideally be accompanied by benchmarks.

Possible issues that could come up from changing this:

The first thread might get started last. By handling the remainder, it then does the least work to compensate. On the other hand, maybe it takes more time for other threads to properly get started because of latency in communication.
Maybe we should pay more attention to alignment of split chunks when iterating over arrays. Or at least have the option to preserve it.

Gregstrq · April 8, 2022, 7:25pm

It seems that another option is to combine distributed and threaded computing. We can create a julia process per L3 cache and pin the process threads to the corresponding cores. The threads in different L3 cashes would be isolated from each other by the fact that they belong to different processes.
The downside is, of course, that it is harder to share the data between the processes. However, if there is not a lot of communication between different thread groups (corresponding to different L3 cashes), then it should be ok.

@carstenbauer Do you think a function to add processes with pinned threads could be a useful addition to your ThreadPinning.jl?

carstenbauer · April 8, 2022, 7:32pm

I want to keep ThreadPinning.jl as slim as possible and focused on threads. So adding processes (via Distributed) isn’t part of its scope. But maybe there is (or should be) a package for managing processes (and threads). Feel free to start it

Gregstrq · April 10, 2022, 2:12pm

No problem.

But, at this moment, I literally want to add a single function, so I thought it is strange to create a separate package for that.

Topic		Replies	Views
@threads :Static seems not doing static scheduling? General Usage question	8	439	March 21, 2023
Thread affinitization: pinning Julia threads to cores General Usage multithreading	10	3771	January 27, 2022
[ANN] Announcing ThreadPinning.jl Package Announcements multithreading	13	1833	August 8, 2024
What is julia doing with your threads? General Usage	23	1127	February 21, 2024
Julia 1.7 says it can switch the thread your task is on. How often does that happen, and how can it be disabled? General Usage task , threads	6	1390	January 28, 2022

The purpose of ThreadingUtilities.jl and Polyester.jl

Related topics