A better approach would be to use a library which has an explicit notion of remote data, such as Dagger.jl. This example could be better done like:
using Distributed, ClusterManagers
addprocs(40)
println(nworkers())
@everywhere using Dagger
summark = Dagger.@shard myid()
# `summark` is an object which points to the result of `myid()` on all workers in the cluster
map(println, summark)
@everywhere myprint(i, s) = println("$(myid()) , i = $i, summark = $s")
@sync for i = 1:100
Dagger.@spawn myprint(i, s)
end
This approach is better because:
- There is no weirdness with global variables (instead there’s just one local variable which points to other “global” variables)
Dagger.@shard
is explicitly built for this purpose, and you will always get the right value for whichever worker the code runs on- You don’t need to express your logic in a for loop; you can use whatever control flow patterns make sense for you
Note that with this example, you’re not guaranteed a perfectly even distribution of prints across the cluster, but generally Dagger will tend to balance the tasks evenly over time.