Hi,
Let’s say i have started julia with JULIA_NUM_THREADS=4
, however in one of my loop I only want to use 2 threads because it gives better performance. How do I do that so that Threads.@threads
only use 2 threads?
Thank you
Is there maybe any other libraries that give control over the number of threads that can be used? I looked at Floops
but couldn’t find anything relevant to my problem.
Semaphore
You can use what ever threading solution you want. (Including Threads.@threads
)
plus a Base.Semaphore
to control number of threads active at a time.
Something like
using Threads
sem = Base.Semaphore(2) # at most 2 at a time
Threads.@threads for ii in 1:100
Base.acquire(sem)
println(threadid())
Base.release(sem)
end
This will ensure only 2 threads are doing anything at a time.
THough which two may change (and in the case of Threads.@thread
will because of how it allocates work in advance).
It may not lead to optimal scheduling however.
ThreadPools.jl
ThreadPools exposes multiple different scheduling algorithms and they can take a pool argument of a subset of threads which you can construct in advance
using ThreadPools
pool = ThreadPools.StaticPool(3, 2) # use only 2 threads, starting from thread 3
@tthreads pool for ii in 1:100
println(threadid())
end
This actually will ensure only thread #3 and #4 are used.
For performance, a better way to limit tasks to be spawned is to specify the base case size rather than hard-coding the number of tasks. In JuliaFolds, you can specify this via basesize
parameter in various APIs. This is especially true in library code.
This is a better approach since it works well even when the input size is changed. For an input smaller than basesize
, you’d get a single-threaded program and with no Task
spawn overhead. For a very large input, it’ll use additional CPUs as needed.
The basesize
parameter can also be used to “simulate” different JULIA_NUM_THREADS
without restarting Julia: Frequently asked questions
This is interesting but correct me if I am wrong, what you describe only works with already parallelised basic functions, @oxinabox solution is more general, correct?
These “already parallelized basic functions” are just normal Julia functions. So, you can do the same for your hand-rolled function. For example, if you can rephrase your program as a Divide-and-conquer algorithm, it’s often straightforward to use this strategy. Even if this does not work, as I said in Async limit - #3 by tkf, I’d recommend using the “worker pool” pattern rather than semaphore. For more specific comments, I think it’d be helpful if you can provide an MWE.