Basic of `@everywhere` and `@distributed` macro

A better approach would be to use a library which has an explicit notion of remote data, such as Dagger.jl. This example could be better done like:

using Distributed, ClusterManagers

addprocs(40)
println(nworkers())

@everywhere using Dagger

summark = Dagger.@shard myid()
# `summark` is an object which points to the result of `myid()` on all workers in the cluster
map(println, summark)

@everywhere myprint(i, s) = println("$(myid()) , i = $i, summark = $s")
@sync for i = 1:100
    Dagger.@spawn myprint(i, s)
end

This approach is better because:

  • There is no weirdness with global variables (instead there’s just one local variable which points to other “global” variables)
  • Dagger.@shard is explicitly built for this purpose, and you will always get the right value for whichever worker the code runs on
  • You don’t need to express your logic in a for loop; you can use whatever control flow patterns make sense for you

Note that with this example, you’re not guaranteed a perfectly even distribution of prints across the cluster, but generally Dagger will tend to balance the tasks evenly over time.

2 Likes