Running a distributed calculation on JuliaHub

Hi!

I’m trying to run a distributed nested loop on JuliaHub with 3-10 workers. The code evaluates a marginal likelihood function on a 3-d grid with the help of a distributed array:

logML_mat = Array{Float64}(undef, np, ngl, ngt, ngs)
logML_mat = distribute(logML_mat; dist=(1,1,1,nworkers()))
@sync @distributed for w in 1:nworkers()
    logML_mat_loc = localpart(logML_mat)
    li = localindices(logML_mat)
    for (ls,s) in enumerate(li[4])
        for p in 1:np
            for i in 1:ngl
                for j in 1:ngt  
                    logML_mat_loc[p,i,j,ls]  = logML( [lambda_grid[i], theta_grid[j], psi_grid[s]] )                        
                end
            end
        end
    end
end

The code runs successfully, but after the outer (distributed) loop is finished, the following error occurs:

LoadError: LoadError: LoadError: TaskFailedException nested task error: On worker 2: TaskFailedException nested task error: peer 3 is not connected to 2. Topology : master_worker

What might be the case?

Best,
Andrey

This failure has to do with the communication topology of the cluster, which only allows direct communication between the head node and worker nodes. @tanmaykm is looking at enabling all-to-all communication and will post an update here after he looks into it.