Actually I find that there are 2 most time consuming parts in my work:
-
spawning the tasks, because args passed to function are large in my task;
-
passing the parts of array SubDArray, which is inefficient as mentioned in the source code of DistributedArray DistributedArrays.jl/darray.jl at master · JuliaParallel/DistributedArrays.jl · GitHub at line 819.