Changing default interface for Julia worker

I am running a distributed workload using Distributed.jl on a cluster that has set up IP over Infiniband (IPoIB). However, when the workers are created, they automatically return the IP address of the slow ethernet interface. A workaround I found is to edit line 1279 of cluster.jl in stdlib/Distributed where the bind address is set from the getipaddr() function to the IP address of the Infiniband. I am aware that you can change the bind-address of a worker via the –bind-to flag, however, this is just for a single worker and is complicated if you can only access a cluster through a reservation system. Now my question is, is there a way to set a preferred interface for Julia workers without having to compile a custom Julia version?

Alternative solutions using for example the ClusterManager are also very welcome.

Hello,

I had similar problems in the past and solved them by adding these lines in lsf ClusterManager:

cmd = `cd $dir ";" hostname -i "|" xargs $exename $exeflags $(worker_arg) --bind-to `
bsub_cmd = `bsub -I -x $(manager.flags) -cwd $dir -J $jobname "$cmd"`

That way the worker was getting its correct ip address and passing to --bind-to. There may be a cleaner solution (I was much less familiar with julia back then, and there appears to be some cleaner options in the chat YMMV :)); but since then, that system was decommissioned and I thankfully didn’t need that workaround anymore.

A general point about “compiling a custom julia version”. Julia is a dynamic language so if you only needed to edit one line to make this work you can do

@eval Distributed function init_bind_addr()
....
your_edits
....
end

And that would replace the function and should recompile a bunch of code automatically as needed at runtime.

Hope this helps,

Cheers!