Sorry for the slow reply. Hopefully this is still useful.
As I understand it, if you designate 4 threads, then the “master thread” is included in that total. When I parallelize over 4 threads, 4 threads are used on my machine.
I’m less familiar with
@distributed, but I can comment on
pmap, the only trick is to initialize julia with
julia -p 4 and this gives parallelization over 4 threads. For
Threads.@threads, I start with the environment variable I listed above. Again,
=4 means I get 4 threads that are parallelized over.
As I understand the current status, the parallelization is still in testing. There’s a new
@sync system in v1.3 (and an earlier version or two), but the problem is that the new parallelization system can be slower since it is not optimized in terms of allocations. So, a code written in v1.1.1 will be faster than v1.2/1.3 (I found a x2 slow-down between those versions). This is supposed to be fixed, but it’s not known when that will be completed.
As for your last paragraph, I suppose the way I would think about it is whatever happened on that third process should be counted as a third process to be initialized in parallel with whatever is running on the other two threads. So, I would write the function with something like
for i = 1:3 and then have an
if statement saying that the
i=3 case gets the sequential, independent commands of the other two threads. That would be where I start, but you might find something more efficient as you get more feedback and develop more code.