Distributed.jl: Group myid() by computing nodes?

lmtzx9h4qqnt · June 20, 2023, 7:24pm

Using Distributed.jl, is there no built-in way to find out which workers are on the same shared-memory node? This post:

shows a manual implementation based on gethostname(), but I don’t know how robust that is? If there’s nothing built-in, maybe there’s already a package that helps with this?

My goal is to create one SharedArray per cluster node, which is accessible to all workers on that node, so I need to find out which machine each of the workers is running on.

johnh · June 21, 2023, 1:40am

Is this any help?

lmtzx9h4qqnt · June 21, 2023, 3:55pm

Well, since Distributed.map_pid_wrkr is not documented or exported, I’d be hesitant to rely on it. And it seems like the suggestion pretty much the equivalent of pmap(_ -> gethostname(), procs())?

lmtzx9h4qqnt · June 23, 2023, 9:30am

I just noticed that Distributed does at least export check_same_host(pids) (no documentation, though), which also seems to be used by SharedArrays, and while that does allow one to check if the given processes are on the same node, it’d be pretty cumbersome and inefficient to use that to partition the processes by their nodes.

Topic		Replies	Views
Getting started with distributed Julia computations on a cluster Julia at Scale	1	581	September 27, 2020
How to have each node in a cluster run a parallel job on its own CPUs, using SSH? Julia at Scale	4	723	October 14, 2024
@spawnat with SlurmClusterManager General Usage question , distributed	3	612	April 4, 2022
Identifying wokers and CPU sockets Julia at Scale	1	493	November 20, 2018
Getting started with HPC and Julia General Usage distributed	23	1020	September 28, 2023

Distributed.jl: Group myid() by computing nodes?

Related topics