I am not an expert, but:
- Worker are different processes that run your code. You can indicate how many do you want to use or with “julia -p ” or with addprocs(). You should have cpus in that computer to get the right performance. Because all process need store its own data, sometimes it could be better to use thread, because they share memory (in conjuntion with SharedArrays), but sometimes you can get strange results if you do not are careful.
The use of everywehere is right, however, with “pmap” you are already using the processing in the different workers, so you should not use distributed, I think.
About rmprocs, you are not removing all processes, you are only removing the 4th process, you can remove all with:
[rmprocs(n) for n in 1:4]
or in a simpler way, storing ids obtained by addprocs:
np = addprocs(4)
…
rmprocs(np)