[ANN] ThreadPools.jl - Improved thread management for background and nonuniform tasks

I am happy to announce the registration of ThreadPools.jl.

As of Julia 1.3, the algorithm of the @threads macro is to pre-divide the incoming range into equal partitions running on each thread. This is very efficient for most use cases, but there are a couple of scenarios where this is a problem:

  • When something else is running on the primary thread, like a GUI or a web server
  • When the tasks are very nonuniform - this can lead to one thread working on the longer tasks while the others sit idle

ThreadPools.jl exposes @fgthreads and @bgthreads, both of which actively manage the thread pool so that when one task finishes, the next one in the queue starts. If one thread is hit with a long task, the others will keep processing the shorter tasks and if a second long one comes along, it will be processed in parallel with the first long one.

I had to produce this package for myself, as I hit both of the above pain points at the same time in my use case. This community has been very helpful, so I figured I’d fill out the corners, document it, and see if you all find it useful. If so, I am committed to keeping it maintained for as long as needed. This is my first registered package, so if there is anything I missed in making this more useful, don’t hesitate to let me know.

Usage

Basic usage is pretty easy. Just replace @threads or map with @fgthreads/@bgthreads or fgmap/bgmap (bg versions are background, keeping the primary thread free):

julia> @bgthreads for x in 1:3
         println("$x $(Threads.threadid())")
       end
2 3
3 4
1 2

julia> bgmap([1,2,3]) do x
         println("$x $(Threads.threadid())")
         x^2
       end
2 3
3 4
1 2
3-element Array{Int64,1}:
 1
 4
 9

Logging

I found this very useful. There are logging versions of all the functions to help tune performance:

julia> ThreadPools.logfgforeach(x -> sleep(0.1*x), "log.txt", 1:8)

julia> ThreadPools.showstats("log.txt")

    Total duration: 1.217 s
    Number of jobs: 8
    Average job duration: 0.46 s
    Minimum job duration: 0.112 s
    Maximum job duration: 0.805 s

    Thread 1: Duration 1.217 s, Gap time 0.0 s
    Thread 2: Duration 0.827 s, Gap time 0.0 s
    Thread 3: Duration 0.613 s, Gap time 0.0 s
    Thread 4: Duration 1.023 s, Gap time 0.0 s

julia> ThreadPools.showactivity("log.txt", 0.1)
0.000   -   -   -   -
0.100   4   2   1   3
0.200   4   2   5   3
0.300   4   6   5   3
0.400   4   6   5   7
0.500   8   6   5   7
0.600   8   6   5   7
0.700   8   6   -   7
0.800   8   6   -   7
0.900   8   -   -   7
1.000   8   -   -   7
1.100   8   -   -   -
1.200   8   -   -   -
1.300   -   -   -   -
1.400   -   -   -   -

There is also a logging version of the original @threads, so you can measure performance of your use case with the built-in macro.

Structure

The ThreadPool is simply a Channel of Tasks that feeds to a number of Task handlers on each thread. Each handler processes the task and hands back to an output Channel. Then there is a result iterator to fetch the results:

put!(pool, fn, args...)
      |
      V
Channel{Task}
      |
      |----------------------------   ...
      |                   |
      V                   V
Thread 1 Handler   Thread 2 Handler   ...
  (optional)              |
      |                   |
      |<---------------------------   ...
      | 
      V
Channel{Task}
      |
      V
results(pool)

Pretty straightforward stuff - the channels serve as a queue of tasks to be completed, and the handlers just process them as they become available. I attempted to wrap it into an even simpler API so the under-the-hood stuff doesn’t have to be dealt with.

Anyhow, thanks for your help, and all feedback appreciated. Cheers!

35 Likes

Thanks for working on this. Does this only work when you start Julia with at least 2 threads? Does it error or deadlock in case of one thread?

1 Like

The simple API functions (@fgthreads, fgmap, @bgthreads, bgmap) all fall back gracefully in the single-thread case. (Which is a pain for testing, BTW - two differing Julia sessions needed.) Using the underlying ThreadPool structure or the logging versions will all throw an error. I was thinking of adding the gracefull fallbacks to the logging cases, if they seem useful.

Please help me fill in what I have missed. It seems that you provide two groups of threads, the “foreground threads” and the “background threads” and one “foreground thread” is distinguished as the “primary thread”. Is the primary thread always the thread with id=1? Is the primary thread task assignable (e.g. run the UI)? How does one put additional tasks into “foreground threads” and other tasks into “background threads”?

There is no explicit thread assignment capability here. (Julia as of now makes that a little difficult, but I think possible.) This just creates a pool of task handlers that manages themselves - that pool may involve the primary thread (fg) or not (bg). The fg versions dont force operation in the foregoround, they just allow it.

The primary thread is indeed id==1, and is the thread that Julia starts on. So any GUI or server async Task will be there. This may change in the future, of course, since the threading thing is new.

Does that help?

Do the bg versions disallow operation in the foreground?

Yes. That way if you have say a server running on the main thread, it won’t go unresponsive if a heavy (ie blocking) task is submitted. (This was one of the issues I had with @threads)

1 Like

Could you elaborate a little more on what the main difference with respect to the functionality of the ‘built in’ Julia scheduler is? Couldn’t a similar thing be achieved with the @spawn macro inside a loop?

Is the main benefit the possibility of subdividing into fg and bg threads?

@spawn potentially spawns the task on the main thread - the @bgthreads macro from this package prevents that.

help?> Threads.@spawn
  Threads.@spawn expr

  Create and run a Task on any available thread. [...]

Thanks you, it seems very nice and useful!

One question, the difference between fgthreads and bgthreads is that the current/first thread in the second one is not used, didn’t it?

Yes.

Ok, then it is a very intuitive concept.
Thanks you again, it is very useful when the time used for each thread could be difference.

Okay yea that’s what I meant with my second sentence, that’s nice though very useful for making sure things don’t run on the primary thread!

@Sukera is correct about @spawn. There are actually two schedulers running around Julia currently. The @spawn macro sends the task to the C-level scheduler, which is explicitly random and includes the primary. The @threads macro divvies up the work before scheduling, then schedules a chunk on each thread (again including the primary). Both schedule their work before it is known how long that work will take.

ThreadPools will only take on work once the previous jobs have finished. If every job is exactly the same size, the overhead is not worth it - stick with @threads. But if there are a few jobs way longer than the rest, I’ve found the above queuing strategy saves time (about 30% faster in my use case).

3 Likes

There are actually two axes that are different from Base here, and they do solve different problems:

  • The scheduling strategy helps in the nonuniform job case
  • Prevention of the primary thread usage helps in the “stay in the backgound” case

It occurs that one thing I did not do, but should, is include a macro for the “I have very short, uniform computations, but I need the primary thread free” scenario. For that, an alternate @threads that uses the pre-division strategy but prevents scheduling to the primary would be most efficient. Easy enough to add while I have my head wrapped around the details, but I’m not sure what to call it to differentiate it clearly from the @bgthreads macro.

2 Likes

@npthreads where np is non primary (pun intended)

1 Like

I like it. I’ll add it for v0.3.0, and I’ll upgrade the documentation to show the decision matrix, something like:

                             Uniform jobs         Non Uniform Jobs 
                          -----------------------------------------
Primary thread usable     |  @threads               @fgthreads
                          |
Primary thread not usable |  @npthreads             @bgthreads
1 Like

Alternately, if we separate the queueing and background axes, we could have:

                             Uniform jobs         Non Uniform Jobs 
                          -----------------------------------------
Primary thread usable     |  @threads               @qthreads
                          |
Primary thread not usable |  @bgthreads             @bgqthreads

the main problem - it breaks the API already. But I think it is a clearer API than what I have now once the non-primary pre-division option is added. Ah, 20/20 hindsight…

You’re in v0.2 of the package and announced it just 3 hours ago - don’t worry about it, at this point a clearer API is welcome :smiley:

3 Likes

Yes, the 0.low_digit versions often are used while getting things right.
You can modify or restructure an as-yet-unsettled API, you cannot break it.

a minor suggestion: use the q as a prefix rather than as an infix

  • renames @bgqthreads to @qbgthreads
    – this makes the table simpler to follow (imo)
1 Like