It’s certainly possible to combine shared and distributed memory parallelism (i.e. threads and julia workers) in julia. The number of threads on workers processes is determined by the JULIA_NUM_THREADS
environment variable. So on Linux I can do launch two workers, each one running 4 threads, as follows:
from the shell run:
export JULIA_NUM_THREADS=4
Then launch julia from that shell session, run addprocs normally and your workers will have 4 threads. The following trivial example demonstrates this:
julia> using Distributed
julia> addprocs(2)
julia> @everywhere begin
function print_id_2(x)
pid = Distributed.myid()
nth = Threads.nthreads()
Threads.@threads for i in 1:nth
tid = Threads.threadid()
println("Hello from thread $tid of $nth on worker $pid. $(x[tid]) is from a vector")
end
end
end
julia> xv = [(i-1)*4+1:4*i for i in 1:3]
3-element Array{UnitRange{Int64},1}:
1:4
5:8
9:12
julia> pmap(print_id_2, xv);
From worker 3: Hello from thread 1 of 4 on worker 3. 1 is from a vector
From worker 3: Hello from thread 4 of 4 on worker 3. 4 is from a vector
From worker 2: Hello from thread 1 of 4 on worker 2. 5 is from a vector
From worker 3: Hello from thread 2 of 4 on worker 3. 2 is from a vector
From worker 3: Hello from thread 3 of 4 on worker 3. 3 is from a vector
From worker 2: Hello from thread 2 of 4 on worker 2. 6 is from a vector
From worker 2: Hello from thread 4 of 4 on worker 2. 8 is from a vector
From worker 2: Hello from thread 3 of 4 on worker 2. 7 is from a vector
From worker 3: Hello from thread 2 of 4 on worker 3. 10 is from a vector
From worker 3: Hello from thread 3 of 4 on worker 3. 11 is from a vector
From worker 3: Hello from thread 1 of 4 on worker 3. 9 is from a vector
From worker 3: Hello from thread 4 of 4 on worker 3. 12 is from a vector
Unfortunately, as far as I can tell, it’s only possible to set the number of julia threads using environment variables. It would be nice if you could, say, pass a keyword argument to addprocs telling it how many threads to use on each new process but that does not seem to be possible.
I’m not sure exactly how you’d do this through a cluster manager like Slurm but presumably there is a way to control the environment that worker processes are launched in? Assuming that there is, you’d just have to make sure that JULIA_NUM_THREADS
was set in that environment.
I’ve also never used Transducers.jl but I would guess that if your julia workers were all running multiple threads and you ran a multithreaded routine inside of a distributed computation it would just work.