I’m working with a DArray from DistributedArrays.jl. I want each process to work on its own part of the array independently, and access the rest of the array only when it needs a nearest neighbor which doesn’t reside in its local part of the array.
The way I’m currently handling this is with this function:
with_workers(f, args...; procs = workers()) = @sync for w in procs
@spawnat w f(args...)
end
The idea is that I combine with_workers with the do end statement:
with_workers() do
#=
Do stuff with the local array.
Here, a worker may access a cell of the DArray
which resides on another process
=#
end
My question is: is this an efficient way of handling parallelism? I cannot use @distributed or pmap, because I explicitly need each worker to work in parallel on its own part of the array, and these two methods would simply call a random worker to run a function. Also, the algorithm basically updates the array, so each worker needs that every other worker has finished its job to be able to work on the next update.
I’m asking this because my algorithm seems to be a bit slow, and I’m not sure if I’m doing something wrong with parallelism and DistributedArrays. Here is my Github repo in case you want to check out the entire project: Lattice.jl deals with the construction of the lattice, and CabibboMarinari.jl deals with the updating algorithm itself.
I have the same problem and I’m looking for ways to solve it,too. @distributed and pmap are insuitable for our needs, actually in my case them donot speedup at all.
I also have the feeling that Distributed.jl is too slow, but I don’t know if I’m doing something wrong or it’s just not useful unless you have really big arrays.
So there’s really no solution at the moment, right? Spawning tasks and setting an array element are necessary operations, and I’m not sure there’s a more efficient way to the them. Maybe allocating a local array and then assigning it to the local portion of the DArray is more efficient than accessing the DArray everytime to set a single element?
Yes, multi-threading is wonderful and I have a multi-threading version of my program. But my program will need serval T Bytes memory and thousands cores in application, which must be run on clusters. Therefore, blabla…, anyway thx!
I tried to make all the large args of function DArray and in this way spawning the tasks is ultra fast. Although this wiil increase the executation time of functions because remote fetching of data in function. But it is still better than spawning tasks with large args.
I wonder if there are some functions in distributed like sactterv in mpi… really confused now.
I’ve done a benchmark a while ago measuring whether it is better to spawn a function passing arguments, or to have it capture the environment as an anonymous function. Strangely enough, it appears as tho passing the argument to the function is a bit slower than having it capture the environment. Not sure if I’ve done something wrong.