@spawnat with SlurmClusterManager

My basic problem is that I have a large matrix I would like all of my workers to read from (no need for write), and I want all of those workers to be across many nodes (a multinode SLURM allocation). If there is a better approach that what I am doing, I would also appreciate advice.

My approach is to create a shared array on process 1, then spawnat it to all of the workers. This works well if you make the procs the usual way. However, it doesn’t seem to work if you make your procs using SlurmManager(). Maybe this is intended behavior so that workers on a node that is not the same as process 1 don’t try to access memory on a different node? If so, is there a work around?

In an ideal world, I think the behavior should be that everytime spawnat tries to spawn the SharedArray to a worker on a new node, a copy is made and then all subsequent workers on that node can read from it without having to make a copy.

Here is a MWE where I am launching both from an SBATCH script that asks for only 1 node so all of the processes are on the same node.

This works!

using Distributed, SlurmClusterManager
addprocs(2)

@everywhere begin
    using SharedArrays
end

x = SharedArray([1 2; 3 4])

for w in workers()
    @spawnat(w,x)
end

@everywhere begin
    println(x[2])
end

This fails with the error below, indicating that the object pointer was not actually transferred to the workers.

using Distributed, SlurmClusterManager
addprocs(SlurmManager())

@everywhere begin
    using SharedArrays
end

x = SharedArray([1 2; 3 4])

for w in workers()
    @spawnat(w,x)
end

@everywhere begin
    println(x[2])
end
ERROR: LoadError: On worker 2:
BoundsError: attempt to access 0×0 Matrix{Int64} at index [2]

It seems the following works, though isn’t amazingly pretty. Thoughts appreciated.

@everywhere begin
    function return_id(x)
        return myid(), gethostname()
    end
end
nworkers = length(workers())
out = pmap(return_id,1:nworkers)
idlst = map(x->x[1],out)
hostlist = map(x->x[2],out)
df = DataFrame(id=idlst,host=hostlist)
gp = groupby(df, :host)
gvec = gp.groups
nodeworkers = [ ]
for i = 1:maximum(gvec)
    push!(nodeworkers,df.id[gvec.==i])
end

Then you can manually build the SharedArray across the groups of workers on the same node by iterating over the nodeworkers list made above.

for subworkers in nodeworkers
    x = SharedArray{Int64}((2,2), init=false, pids=subworkers)
    x .= [1 2; 3 4]
end

You’re right to assume SharedArrays won’t work, as at most that will allow sharing the data amongst workers on the same physical node.

As a solution, and this might well be wrong as I don’t have a SLURM setup to test on handy but: doesn’t @everywhere work? i.e.:

# do some work to generate matrix M locally
M = somework()

@everywhere M = $M

I think what you suggest will incur a large inter-node communication time for workers on a node that is different from the node hosting process 1. One option might be to do what you suggest on each node and scope @everywhere only to the processes on that node? It is not obvious to me whether having the array on one process per node or one shared array over all processes per node is the better solution in terms of communication time. It likely depends on the application (and I will try to check it for mine soon).

I still think that either of the two options we outlined are a bit manual and that one might hope for better default behavior. I would really be interested to understand what setting in SlurmManager() are preventing my MWE from working when all procs are on a single node.