Using Distributed.jl, is there no built-in way to find out which workers are on the same shared-memory node? This post:
It seems the following works, though isn’t amazingly pretty. Thoughts appreciated.
return myid(), gethostname()
nworkers = length(workers())
out = pmap(return_id,1:nworkers)
idlst = map(x->x,out)
hostlist = map(x->x,out)
df = DataFrame(id=idlst,host=hostlist)
gp = groupby(df, :host)
gvec = gp.groups
nodeworkers = [ ]
for i = 1:maximum(gvec)
Then you can manually build the SharedArra…
shows a manual implementation based on
gethostname(), but I don’t know how robust that is? If there’s nothing built-in, maybe there’s already a package that helps with this?
My goal is to create one
SharedArray per cluster node, which is accessible to all workers on that node, so I need to find out which machine each of the workers is running on.
Distributed.map_pid_wrkr is not documented or exported, I’d be hesitant to rely on it. And it seems like the suggestion pretty much the equivalent of
pmap(_ -> gethostname(), procs())?
I just noticed that Distributed does at least export
check_same_host(pids) (no documentation, though), which also seems to be used by SharedArrays, and while that does allow one to check if the given processes are on the same node, it’d be pretty cumbersome and inefficient to use that to partition the processes by their nodes.