Requesting idle workers to speed up unbalanced processes with pmap

Dear Julia users,

I am currently coding an algorithm that performs the same task on several pieces of a dataset in parallel using pmap, and then gathers the results in a SharedArray. Due to the exploratory nature of the algorithm, the execution time of the task can highly vary depending on the part of the dataset on which it executes, and I have no way to predict it.

Hence when running my code on a cluster, at some point, it turns out that a few workers are still busy, while all the others are idle.

My question is: is it possible for busy workers to somehow “request the help” of idle workers, since other parts of the task to be performed can be parallelized?
In other words: can a worker process call in turn pmap? How to define the pool of idle workers inside a pmap call?

I am kind of a newbie with Julia and parallel computing, so I hope my question is clear enough.

Thank you in advance for your help! :slightly_smiling_face:

1 Like

You are looking for redefining pmap since built in works on a chunks. Fortunately in documentation that is already done https://docs.julialang.org/en/stable/manual/parallel-computing :slight_smile:

Thanks for your answer! :slightly_smiling_face:

Are you talking about this piece of code?

function pmap(f, lst)
    np = nprocs()  # determine the number of processes available
    n = length(lst)
    results = Vector{Any}(n)
    i = 1
    # function to produce the next work item from the queue.
    # in this case it's just an index.
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p=1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > n
                            break
                        end
                        results[idx] = remotecall_fetch(f, p, lst[idx])
                    end
                end
            end
        end
    end
    results
end

I don’t really see here how I can modify the code so that busy workers ask idle workers in the pool to help them. Maybe I am missing something simple…

1 Like

It’s OK to pmap a function that inside executes an other pmap of a subfunction on more granular computing tasks. Under the hood, the top level pmap divides the array into small chunks. And this chunks get divided once more into the more granular computing tasks and those are qeued. Then, all workers process the queued tasks until there is nothing in the queue left.
This is a common pattern.

Yes, that is the code. There would be no idle workers if one keeps feeding them one by one when they are finished with previous task.

That alternative pmap function code, which I’ve used and modified myself, is good at scheduling new jobs to idle workers. But it can’t take the work from an already-busy worker and somehow give some of that work to an idle work. It wouldn’t know how to divide up your task or how to move data efficiently, if it could somehow do that. It comes down to you designing and monitoring your tasks to be a good balance of data transfer and task work. Maybe you can divide things up differently?

With regard to nested pmap: if all 32 workers are busy with a primary job, then what happens when one of those workers starts a nested pmap? It seems to me that there are no more workers available, and the task would run on the primary job’s thread, so there’d be nothing gained in that full scenario. I can see how they might help if the workers weren’t all busy.

The pmap does nothing magical - magics happens with @async. Here you give the control to the task scheduler. And with nested pmap you do it once more. And even the nested pmap sees the whole worker pool…

1 Like

Thank you all for your answers!
Indeed, I’ll look to my code to see if I can divide up the tasks more efficiently.

However, about the fact that the nested pmap sees the whole worker pool, I tried the following code to test it:

addprocs(6)
pmap((i)->(sleep(i);println(default_worker_pool().workers)),1:3);

The results are:

From worker 2:	Set{Int64}()
From worker 4:	Set{Int64}()
From worker 5:	Set{Int64}()

So if I understand well, only 3 workers should be busy here; and therefore they should see workers 3,6,7 as being idle, and maybe available for other tasks. So why does it return an empty set of idle workers?

I wonder what situation do you have in which you would like to submit a task (some function to calculate) from a worker process?

Also see this thread.