Getting started with distributed Julia computations on a cluster

gideonsimpson · September 27, 2020, 4:57pm

Thus far, my parallel Julia experiences have involved running jobs on a single node on a cluster (shared memory environment) using the Distributed.jl module for computations like:

using SharedArrays

a = SharedArray{Float64}(10)
@distributed for i = 1:10
    a[i] = i
end

Which is to say, computations where there is, at most, a reduction operation amongst the workers.

If possible, I would like to begin taking advantage of multiple nodes on the cluster I have access to, but I’m having some trouble getting a sense of how to get started using Julia in a distributed memory environment. Does anyone have any suggestions on how to get started? If it’s at all helpful, the cluster I’m using runs the Univa Grid Engine.

jishnub · September 27, 2020, 7:06pm

The distributed equivalent of a SharedArray is DistributedArray. This should do something like what you were doing:

julia> DArray((10,)) do I 
       a = Array{Float64}(undef, size.(I)...)
       for (ind, i) in enumerate(I[1])
           a[ind] = i
       end
       a
       end
10-element DArray{Float64,1,Array{Float64,1}}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

Here I holds the indices of the local part on each worker.

Have a look into ClusterManagers.jl to see if it helps in launching jobs on your cluster.

For jobs that are embarrassingly parallel, look into pmap. This keeps track of free workers and submits jobs to them, ensuring that the nodes are evenly loaded.

I’ve found that parallel mapreduce can be sped up by using binary-tree based reductions locally on each node instead of the @distributed (op) for loops in Distributed. I had written a package to do this, although this isn’t widely tested. For comparison, on a Slurm cluster using 2 nodes with 28 cores on each I obtain

julia> @time @distributed (+) for i=1:nworkers()
           ones(10_000, 1_000)
       end;
 22.355047 seconds (7.05 M allocations: 8.451 GiB, 6.73% gc time)

julia> @time pmapsum(x -> ones(10_000, 1_000), 1:nworkers());
  2.672838 seconds (52.83 k allocations: 78.295 MiB, 0.53% gc time)

Topic		Replies	Views
How to get started with distributed memory parallel programming? New to Julia	3	689	June 9, 2021
Distributed Computing with Slurm and Julia Julia at Scale	9	3469	February 10, 2022
Getting started with HPC and Julia General Usage distributed	23	968	September 28, 2023
Alternative to SharedArrays for multi-node cluster? General Usage question	9	1108	April 20, 2020
Tutorial and best practices for distributed processing with Julia on a cluster General Usage parallel , distributed	4	384	July 11, 2022

Getting started with distributed Julia computations on a cluster

Related topics