There’s another post with the same name [1], but I couldn’t get the answer I want out of it despite that it has something marked as “answer”
My situation is that I would like to have control over the number of threads used Threads.@threads. I have multiple bigjob(arg) that I want to run, they’re all independent of each other, they share no resources and I don’t care what order they’re ran in.
Elsewhere I have a function which contains
Threads.@threads for arg in args
bigjob(arg)
end
How many threads are used here?
For my research I normally run julia from vscode with julia -t 10 but I don’t necessarily want to use 10 threads every time I call Threads.@threads because I might have other stuff going on at any given time. I have some intuition, given how busy my computer is, how many concurrent bigjobs I can have.
Is there some syntax that allows me to have some control over this a call time? I imagine something like
Threads.@threads :njobs=4 for arg in args
bigjob(arg)
end
or something? What do people do when they want to use fewer concurrent jobs than their julia was started with?
The @threads macro will spawn as many tasks as you have threads. If you use @threads :static, each thread processes one task. If you use @threads :dynamic, any available thread may process any of the tasks.
If you want to control the number of tasks more fine-grainedly, just use Threads.@spawn in your for-loop. That is, instead of:
Threads.@threads for i in lots_of_work
bigjob(i)
end
You can do:
@sync for t in fewer_tasks
Threads.@spawn bigjob(t)
end
# Variant 2: macro API
function mc_parallel_macro(N; ntasks=nthreads())
M = @tasks for i in 1:N
@set begin
reducer=+
ntasks=ntasks
end
rand()^2 + rand()^2 < 1.0
end
pi = 4 * M / N
return pi
end
Correct me if I’m wrong, but @lmiq 's example the work is first chunks, and then sent to threads…? This isn’t great if the work is homogeneous in size (imigine you have some tasks that are 10x as long as others).
Doesn’t Julia have the pattern where you create N threads, have them on a loop consuming from a channel, then exist the treads built-in?
@everywhere function bigjob(arg)
...
end
pmap(bigjob, args)
You can control the number of jobs by either initializing julia with julia -p n_procs or create your own WorkerPool
If you want to stick to threads, then you could use asyncmap
As reported by the documentation, you can even pass a function to decide how many tasks you can run in parallel, so you can decide the load based on other factors
Starting with Julia 1.11 this is available by adding the :greedy switch to @threads as follows:
Threads.@threads :greedy for item in collection
do_the_work(item)
end
However, this doesn’t let you set N; the number of tasks is always Threads.threadpoolsize(). See the documentation for details: Multi-Threading · The Julia Language
To control N, use OhMyThreads.jl instead. Here’s how:
using OhMyThreads
@tasks for item in collection
@set begin
scheduler=:greedy
ntasks=N
end
do_the_work(item)
end
There’s also a macro-free form:
using OhMyThreads
tforeach(collection; scheduler=:greedy, ntasks=N) do item
do_the_work(item)
end
You have channels (Asynchronous Programming · The Julia Language). You can with that deal with highly inhomogeneous tasks and control the number of processors used. I think (I might be wrong), though, that ˋOhMyThredsˋ can del with that automatically.
As mentioned, yes we do have that. With OhMyThreads and the GreedyScheduler you can also control the number of tasks as desired.
Keep in mind though, that this pattern of greedily fetching from a channel is in many cases pretty inefficient, and it’s often (but certainly not always!) best to just spawn more tasks and break up the work that way, unless you have an extremely extremely imbalanced workload. Julia’s scheduler has no problem with having more active tasks than threads.
You can also use the :roundrobin iteration order to shuffle up your data between the tasks which can also help with load balancing.