Thank you for your suggestion.
Deducting from your comment I realized I had the wrong conclusion. In my case the ports are all different on a given IP (node). They only overlap across nodes, but as you pointed out this actually should not be a problem. Practice supported this idea, I had several runs with overlapping ports across nodes without the “connection refused” error.
Then I don’t know where the connection refused
error come from. Most probably some nodes can not connect together over a given port. I should be related to the cluster configuration, and administrators could not give me answers yet.
I could not implement the suggestion in the link you provided, because the perl workaround also generates unwanted quotes (maybe it’s new in recent version ?). Actually after reading about the Cmd
Object, it seems that it is the expected behavior. Indeed it prevents code injection from Julia on the cluster [1].
Then I don’t know how to control the port range parameters when using ClusterManager.jl
. It could solve my problem if this problem is related to ports.
Still investigating about this. If anyone has pointers to port configuration in julia Distributed
and ClusterManager.jl
, and TCP socket error connection refused
cause, any reference is helpful.
[1] See this thread