Discussion on ThreadPools.jl

Continuing the discussion from Multithreaded compute farm:

@tro3 I don’t think the change in Julia 1.7 that allows migration of tasks across threads is particularly relevant to ThreadPools. The example overhead I was mentioning was that, since the migration is only allowed for non-sticky tasks, and since ThreadPools need to use sticky tasks by design (to restrict the worker threads to be used by Julia runtime), this new feature is unavailable for code using ThreadPools.

I think the latest discussion that is relevant to ThreadPools is Severe thread starvation issues · Issue #41586 · JuliaLang/julia · GitHub (though maybe you are already aware of this).

4 Likes

Does the “migration” just mean a thread can wake up of any available thread, rather that that first assigned? I agree that is an improvement to Julia, but that would seem to make keeping Tasks off the primary more difficult. I see - hence the sticky bit. Okay, I’ll have to chew on that.

I’ll look at the starvation issue. That one does indeed look interesting.

After reading the thread starvation issue #41586 carefully I am still unable to understand what is the new info there.

E.g. Wikipedia writes under “Cooperative multitasking”:

As a cooperatively multitasked system relies on each process regularly giving up time to other processes on the system, one poorly designed program can consume all of the CPU time for itself, either by performing extensive calculations or by busy waiting;

It seems to me that this is the case for the described system and the example code. If that is true, then inserting gc safepoints will not be enough, yields are necessary. (But other code may benefit from that, e.g.: Multi-threadded processing of large number of files faster when reading in batch )

Am I missing something? (To be honest, I do not understand everything, especially in vchuravy’s comment, or what exactly @threads begin ... end is - it only works for for cycles, not for begin blocks)

Could you please explain a bit, what is this issue about?

To being nitpick-y a bit, it means that “a task can wake up on any available (OS) thread.”

Or more concretely, migration of task means

Threads.@spawn begin
    i = Threads.threadid()
    yield()
    j = Threads.threadid()
    @assert i == j
end

potentially can throw after Julia 1.7. It has not been the case until 1.6.

I think it’s fine for ThreadPools.jl since the task.sticky = true trick (or the public API @async) still works (i.e., the task will not be migrated).

Yes, that s right. But GC was part of the discussion since it also can manifest as a scheduling problem. As you pointed out, we need to insert yield points (or support preemptive scheduling), if we want to do everything in Julia’s task scheduler for achieving some kind of fairness. Another option is to just let the OS do it for a limited set of tasks, which is what JeffBezanson was suggesting.

4 Likes

Yeah, yeah - that’s what I meant. :slight_smile: Thx

2 Likes

I think that was a proposal to allow you to create a new thread to execute a begin block in. and not just a new task, but a new OS thread but I could be wrong?

I do think a macro @withOSthread begin ... end would be useful and I like using a name like that to make it clear what it does. If you want to spawn off a big computation that could take hours that’d be a hugely useful way to isolate that computation from other mixed tasks that might be going on.

1 Like