I have access to a cluster with multiple nodes, connected thorugh SSH, each of which has multiple CPUs. My algorithm uses DistributedArrays.jl
to create a distributed array and have multiple processes work on it: at the end of the simulation, the result is returned to the master worker. Since I have a lot of nodes, it doesn’t make sense to distribute an array on all of them, so I thought of running multiple simulations, each having different parameters, on each node. The problem is that if I use addprocs(p)
with p = [("node1", 3), ("node2", 3), ("node3", 3), ...]
(basically, I’m launching 3 processes on each node), I get a list of all processes, without any subdivision into nodes.
What I would like is something like pmap
, the assign a job to an available process until the list of jobs is completed, but which works for groups of processes. For example, I would like to write something like this:
pmap(f, [[2, 3, 4], [5, 6, 7], [8, 9, 10]], params)
This would apply f
to each combination of parameters in params
, and assign the job to three processes at a time, that would work in tandem until completion.
Is this possible?