Nesting `Threads.@thread` and `Polyester.@batch` (or context manager to limit Polyester threads)

How dangerous is it to run Polyester.@batch inside of a Threads.@thread?

An example where I might need this: a library implements a low level function that uses Polyester for multithreading, and I want to run multiple such functions in parallel.

Furthermore, to make this actually useful (i.e., not letting a single Polyester.@batch hogging all the threads), is it possible to control the number of Polyester threads with a context manager of some kind. E.g., how do I implement this pseudocode:

function f(arg)
    println(arg)
    Polyester.@batch for i in 1:10000
        do_something()
    end
end

@thread for j in 1:2
    set_max_polyester_threads(2)
    f(j)
end

I want this code to cause 4 threads to be running in parallel. Two threads dedicated to f(1) and two threads dedicated to f(2). I want this to work in a situation in which I can not modify f itself. Is this possible?

Background: I am using Polyester because benchmarking has shown that cheap threads for the “inner” problem do provide a big performance gain. On the other hand, the “inner” problem can not effective use more than a handful of threads. The outer problem (running the inner problem multiple times) is embarrassingly parallel.

I had some functionality to support something like that in Polyester before, but ripped it out as it wasn’t documented and I didn’t think anyone was using it, and I didn’t feel like maintaining it at the time.

Something like that could be added again, but wouldn’t it be better to just disable inner threading entirely, focusing on outer threading of your embarrassingly parallel program only?

That’d be easier to do without modifying f, and in fact is something you could already do without modifying the library.

t, r = PolyesterWeave.request_threads(PolyesterWeave.num_threads())
# Polyester, LoopVectorization's threading, and Octavian's threading are now disabled
foreach(PolyesterWeave.free_threads!, r)
# they're now reenabled
1 Like

Yes, this perfectly addresses my use case. And you are right about just disabling the inner threads. I guess the memory pressure might be a bit worse if I disable the inner threads, but it is not worthwhile for me to profile that yet.

Also, because this is definitely relying on internals, we could expose something with some guaranteed stability.
Perhaps a function, disable_polyester_threads(f::F).

1 Like

I will make a pull request to Polyester with this.

1 Like

Here is a draft disable_polyester_threads by Krastanov · Pull Request #86 · JuliaSIMD/Polyester.jl · GitHub

I had some functionality to support something like that in Polyester before, but ripped it out as it wasn’t documented and I didn’t think anyone was using it, and I didn’t feel like maintaining it at the time.

I have been searching for something exactly like this. My “inner problems” are large loops which I’d like to multithread with @tturbo, while the outer task is just a couple of invocations of the inner loop with different arguments. Rather than do, e.g.

@turbo thread=40 <some work>
@turbo thread=40 <some work>

I would love to be able to do

@sync begin
    Threads.@spawn do
        @turbo thread=20 <some work>
    end
    Threads.@spawn do
        @turbo thread=20 <some work>
    end
end

but with the thread counts determined dynamically in the outer task.
It seems like the latter ought to give a better speedup for a range of loop sizes.
It also lets you get around Amdahl’s law better—if you have some piece of work which isn’t as large of a loop but can be computed independently of the other big loops, then you can run it concurrently on just one Polyester thread with the other big loops.

You mentioned you ripped it out of Polyester, so probably your judgment about whether it belonged there was correct, but what would it take to implement the thing I’ve sketched here in my own code?

Polyester is more stable now, feel free to make a PR to add something like that back.