Getting started with distributed Julia computations on a cluster

The distributed equivalent of a SharedArray is DistributedArray. This should do something like what you were doing:

julia> DArray((10,)) do I 
       a = Array{Float64}(undef, size.(I)...)
       for (ind, i) in enumerate(I[1])
           a[ind] = i
       end
       a
       end
10-element DArray{Float64,1,Array{Float64,1}}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

Here I holds the indices of the local part on each worker.

Have a look into ClusterManagers.jl to see if it helps in launching jobs on your cluster.

For jobs that are embarrassingly parallel, look into pmap. This keeps track of free workers and submits jobs to them, ensuring that the nodes are evenly loaded.

I’ve found that parallel mapreduce can be sped up by using binary-tree based reductions locally on each node instead of the @distributed (op) for loops in Distributed. I had written a package to do this, although this isn’t widely tested. For comparison, on a Slurm cluster using 2 nodes with 28 cores on each I obtain

julia> @time @distributed (+) for i=1:nworkers()
           ones(10_000, 1_000)
       end;
 22.355047 seconds (7.05 M allocations: 8.451 GiB, 6.73% gc time)

julia> @time pmapsum(x -> ones(10_000, 1_000), 1:nworkers());
  2.672838 seconds (52.83 k allocations: 78.295 MiB, 0.53% gc time)
3 Likes