Ok, I think I’ve found the cause of the issue. Seems the cluster blocks TCP connections from the login nodes to the compute nodes. I thought Julia only needed the other direction to be allowed, since the worker processes on the compute nodes connect to the login node, but after digging into the code a bit it seems like after the initial connection, the master process then initiates a connection in the other direction (https://github.com/JuliaLang/julia/blob/v1.1.0/stdlib/Distributed/src/managers.jl#L437), and its this that’s failing. Not sure a good workaround here…
(Btw, for anyone wondering, turns out good way to insert debug statements into Julia Stdlib or Base without recompiling Julia is just to @eval
them directly into these modules)