Dynamically schedule work onto pmap worker pool?

marius311 · January 21, 2021, 2:38am

I’ve got N chunks of equally long-running work I want to compute across <N processes, but one chunk is very different code / results than the others. So code-wise, I’d like to schedule that chunk alone, then pmap over the remaining N-1 chunks. Something like:

N = 4
addprocs(N) # as an example, in reality N > # of workers
@spawnat :any begin
    println("starting unique chunk")
    sleep(2)
end
pmap(1:N-1) do i
    println("starting chunk $i")
    sleep(2)
end

The problem is that this seems to sometime double up on a worker, rather use all (in this case) 4 workers. E.g. here’s a typical output:

      From worker 4:	starting unique chunk
      From worker 4:	starting chunk 1
      From worker 3:	starting chunk 2
      From worker 2:	starting chunk 3

I’ve figured out that it seems to work if I use pmap in both cases:

@sync begin
    @async pmap(1:1) do _
        println("starting unique chunk")
        sleep(2)
    end
    pmap(1:3) do i
        println("starting chunk $i")
        sleep(2)
    end
end

But I feel like there must be a cleaner way to do this. Any ideas? Thanks.

greg_plowman · January 21, 2021, 4:00am

Maybe you could use WorkerPools?

using Distributed
N = 12  # chunks
M = 4  # workers
addprocs(M)
wpids = workers()
uniquePool = WorkerPool(wpids[1:1])
standardPool = WorkerPool(wpids[2:end])

pmap(uniquePool, 1:1) do i
    println("starting unique chunk")
    sleep(2)
end

pmap(standardPool, 2:N) do i
    println("starting chunk $i")
    sleep(2)
end

marius311 · January 21, 2021, 5:40am

Thanks for the suggestion. The problem with this though is that even if I launch these with @async, after finishing the unique chunk, wpids[1] sits idle when what I want is that its helping finish the other chunks.

greg_plowman · January 21, 2021, 7:31am

Well, I think that’s a little easier:

using Distributed
N = 12  # chunks
M = 4   # workers
addprocs(M)

pmap(1:N) do i
    if i == 1
        println("starting unique chunk")
        sleep(2)
    else
        println("starting chunk $i")
        sleep(2)
    end
end

marius311 · January 21, 2021, 8:00am

Yea, its definitely not impossible, just that code organization wise its awkward to do in my case. Was hoping to find a way to just queue up one work chunk in a different part of the code from all the rest.

DrChainsaw · January 21, 2021, 10:44am

It seems you can push idle/new workers to an existing pool and they will be used by an ongoing computation. No idea if this is ‘safe’ or ‘correct’ though.

marius311 · January 22, 2021, 1:13am

Thanks, yea it looks there’s a form of remotecall*(::AbstractWorkerPool) and this is exactly what it does. So I think this is the “cleanest” way to do what I was after:

pool = default_worker_pool()
@sync begin
    @async remotecall_fetch(pool) do
        println("starting unique chunk")
        sleep(2)
    end
    @async pmap(pool, 1:N-1) do i
        println("starting chunk $i")
        sleep(2)
    end
end

Topic		Replies	Views
Requesting idle workers to speed up unbalanced processes with pmap General Usage pmap	9	1565	March 21, 2018
Pmap using fewer workers than expected after some time Julia at Scale pmap	5	586	February 10, 2021
Pmap that uses workers() AND the master process for computation? General Usage question	2	561	November 20, 2016
Pmap but with control over which workers do the tasks? Julia at Scale question	0	424	January 12, 2020
Distributed nested in Pmap Julia at Scale	2	1155	February 22, 2019

Dynamically schedule work onto pmap worker pool?

Related topics