Control number of threads

Hi,
Let’s say i have started julia with JULIA_NUM_THREADS=4, however in one of my loop I only want to use 2 threads because it gives better performance. How do I do that so that Threads.@threads only use 2 threads?
Thank you

Is there maybe any other libraries that give control over the number of threads that can be used? I looked at Floops but couldn’t find anything relevant to my problem.

Semaphore

You can use what ever threading solution you want. (Including Threads.@threads )
plus a Base.Semaphore to control number of threads active at a time.

Something like

using Threads
sem = Base.Semaphore(2)  # at most 2 at a time
Threads.@threads for ii in 1:100
    Base.acquire(sem)
    println(threadid())
    Base.release(sem)
end

This will ensure only 2 threads are doing anything at a time.
THough which two may change (and in the case of Threads.@thread will because of how it allocates work in advance).

It may not lead to optimal scheduling however.

ThreadPools.jl

ThreadPools exposes multiple different scheduling algorithms and they can take a pool argument of a subset of threads which you can construct in advance

using ThreadPools
pool = ThreadPools.StaticPool(3, 2)  # use only 2 threads, starting from thread 3
@tthreads pool for ii in 1:100
    println(threadid())
end

This actually will ensure only thread #3 and #4 are used.

5 Likes

For performance, a better way to limit tasks to be spawned is to specify the base case size rather than hard-coding the number of tasks. In JuliaFolds, you can specify this via basesize parameter in various APIs. This is especially true in library code.

This is a better approach since it works well even when the input size is changed. For an input smaller than basesize, you’d get a single-threaded program and with no Task spawn overhead. For a very large input, it’ll use additional CPUs as needed.

The basesize parameter can also be used to “simulate” different JULIA_NUM_THREADS without restarting Julia: Frequently asked questions

3 Likes

This is interesting but correct me if I am wrong, what you describe only works with already parallelised basic functions, @oxinabox solution is more general, correct?

These “already parallelized basic functions” are just normal Julia functions. So, you can do the same for your hand-rolled function. For example, if you can rephrase your program as a Divide-and-conquer algorithm, it’s often straightforward to use this strategy. Even if this does not work, as I said in Async limit - #3 by tkf, I’d recommend using the “worker pool” pattern rather than semaphore. For more specific comments, I think it’d be helpful if you can provide an MWE.

1 Like

@oxinabox thank you for your feedback, your answer is very good for me, hope you are ok that I chose @tkf answer.