I have a simple question on using Distributed.jl. I am working on an 8-core machine. Now, I have a simple loop that I can parallelize. So I carry out the following commands:
using Distributed
addprocs(30)
@everywhere function test(n)
return 2*n
end
pmap(x->test(x), 1:30)
Now, I have following questions:
If I have an 8-core machine, why am I able to add more than 8 processes. I was expecting the above code to error out on the addprocs line. So, does it mean, I can create as many processes as I want, irrespective of the number of available cores? If yes, if I create more processes than available cores, does it not lead to perfect parallelization?
When I add n procs, I see the number of workers are n+1. I am assuming it means n processes gets added up on top of 1 main process. However, I see that this main process does no computation while all other processes are busy. For example, if do addprocs(30), and then do pmap(x->test(x), 1:31), that 31st function call also goes on one of the added processes rather than simply running on the main process. Why is this the case?
I am working on a remote machine, where I can adjust the number of cores I request. I have a complex code (similar to test function but do more heavy computations) which can be parallelized. What’s the best approach to do so? My plan was to request n cores, add n processes using addprocs, and then simply use pmap. But after the above experimentation, I am not entirely sure if requesting n cores is the best thing to do.
Is there a reason you don’t use threads instead of processes? If everything runs on the same machine ususally threads are the better answer except you have strong reasons you need the additional separation of workers (e.g. due to external libraries that otherwise can’t be used in parallel).
To briefly answer the questions
Yes you can make as many processes as you want. They are the same thing as when you run mutliple programs simultaneously which has nothing to do with the amount of cores of the CPU per se.
It is called addprocs because it adds workers. If you run it twice you get twice the amount of workers. The main process (id 1) handels the communication and is thus not a “worker”. You can read more about the design here in the manual.
Yes generally for optimal performance there should be exactly one worker per CPU core. Hyperthreading, thermal effects, etc may complicate this picture a bit, so you could experiment a bit with varying number of workers. But generally 1 worker per core should be best.
Thanks @abraemer for the response. That makes things clearer.
I have used multithreading before, which seems to make my code slower (but that was a few years ago), so I never tried it again. Also, I have several independent mathematical optimization problems that I am trying to parallelize. According to this post, distributed computing is more suited for large optimization problems. But maybe I am doing something wrong, I am not well-versed with all the technicalities behind multithreading and distributed computing.
Thanks @johnh. I am not using a resource manager. I use simple Slurm commands to get access to a node, and then run my script as usual. For example:
srun -N 1 --ntasks-per-node=10 --mem-per-cpu=1gb -t 00:30:00 -p interactive --pty bash command run on the remote server gives me access to one of its nodes (and I request to use 10 cores on this node), where I can run my code interactively for 30 minutes. And after that I run my julia script as follows:
using Distributed
addprocs(10)
@everywhere function test(n)
return 2*n
end
pmap(x->test(x), 1:10)
Does this look OK? I don’t intend to use a resource manager if it’s not really needed, but maybe I am missing something, and I do actually need it to run the code correctly.