I’ve got N chunks of equally long-running work I want to compute across <N processes, but one chunk is very different code / results than the others. So code-wise, I’d like to schedule that chunk alone, then pmap over the remaining N-1 chunks. Something like:
N = 4
addprocs(N) # as an example, in reality N > # of workers
@spawnat :any begin
    println("starting unique chunk")
    sleep(2)
end
pmap(1:N-1) do i
    println("starting chunk $i")
    sleep(2)
end
The problem is that this seems to sometime double up on a worker, rather use all (in this case) 4 workers. E.g. here’s a typical output:
      From worker 4:	starting unique chunk
      From worker 4:	starting chunk 1
      From worker 3:	starting chunk 2
      From worker 2:	starting chunk 3
I’ve figured out that it seems to work if I use pmap in both cases:
@sync begin
    @async pmap(1:1) do _
        println("starting unique chunk")
        sleep(2)
    end
    pmap(1:3) do i
        println("starting chunk $i")
        sleep(2)
    end
end
But I feel like there must be a cleaner way to do this. Any ideas? Thanks.
             
            
              
              
              
            
            
           
          
            
            
              Maybe you could use WorkerPools?
using Distributed
N = 12  # chunks
M = 4  # workers
addprocs(M)
wpids = workers()
uniquePool = WorkerPool(wpids[1:1])
standardPool = WorkerPool(wpids[2:end])
pmap(uniquePool, 1:1) do i
    println("starting unique chunk")
    sleep(2)
end
pmap(standardPool, 2:N) do i
    println("starting chunk $i")
    sleep(2)
end
             
            
              
              
              1 Like
            
            
           
          
            
            
              Thanks for the suggestion. The problem with this though is that even if I launch these with @async, after finishing the unique chunk, wpids[1] sits idle when what I want is that its helping finish the other chunks.
             
            
              
              
              
            
            
           
          
            
            
              Well, I think that’s a little easier:
using Distributed
N = 12  # chunks
M = 4   # workers
addprocs(M)
pmap(1:N) do i
    if i == 1
        println("starting unique chunk")
        sleep(2)
    else
        println("starting chunk $i")
        sleep(2)
    end
end
             
            
              
              
              
            
            
           
          
            
            
              Yea, its definitely not impossible, just that code organization wise its awkward to do in my case. Was hoping to find a way to just queue up one work chunk in a different part of the code from all the rest.
             
            
              
              
              
            
            
           
          
            
            
              It seems you can push idle/new workers to an existing pool and they will be used by an ongoing computation. No idea if this is ‘safe’ or ‘correct’ though.
             
            
              
              
              1 Like
            
                
            
           
          
            
            
              Thanks, yea it looks there’s a form of remotecall*(::AbstractWorkerPool) and this is exactly what it does. So I think this is the “cleanest” way to do what I was after:
pool = default_worker_pool()
@sync begin
    @async remotecall_fetch(pool) do
        println("starting unique chunk")
        sleep(2)
    end
    @async pmap(pool, 1:N-1) do i
        println("starting chunk $i")
        sleep(2)
    end
end
             
            
              
              
              1 Like