Huge distributed overhead

My code looks something like this. First, I have a file where I define my functions

#Functions.jl file
#Here we import the relevant packages and define all the functions with @everywere 

@everywhere Solver(A, N) 
     B = SharedArray{Float64}(N) 
     @sync @distributed for i = 1:N 
         ...#Some operations over A dependent on i. We save the results of these operations in B
return B

Then we have a new file for calling the functions


for j=1:M #sequential for loop
     A = ...#generates sparse array A. This array is very large.
     B = Solver(A, N) 
     writedlm("Data.csv", B) 

With this code I experience a massive overhead. For nworkers <= 3 (that coincides with the size of the distributed for loop, N, so we can regard nworkers = N) everything works as expected, each “Data.csv” file is produced in around 5 min each.
However for nworkers > 3 it takes around 25 min to produce one file. This time is independent of nworkers after nworkers > 3, meaning that the time that it takes to write a file for N = nworkers = 5 is the same as for N = nworkers = 32.

I have no idea what could be going on. One of my hypothesis was that the input sparse matrix A was being copied N times, however my last solution attempt was to declare it as a distributed matrix, this didn’t solve the problem.

If someone has some insight or comments about this problem I’d be extremely grateful.

In general using mulitthreading will have much lower overhead than distributed because you can then share data

I tried to do so, however it is not working properly.

When I use distributed write ps auxr on the terminal of the respective PC, I can see the different workers (different PID) being executed. However this is not the case with multi-threating. When I run julia -t 4 I just see a single PID instead of the 4 processes that I expect. Is this expected?