My code looks something like this. First, I have a file where I define my functions

```
#Functions.jl file
#Here we import the relevant packages and define all the functions with @everywere
@everywhere Solver(A, N)
B = SharedArray{Float64}(N)
@sync @distributed for i = 1:N
...#Some operations over A dependent on i. We save the results of these operations in B
end
return B
end
```

Then we have a new file for calling the functions

```
include("Functions.jl")
for j=1:M #sequential for loop
A = ...#generates sparse array A. This array is very large.
B = Solver(A, N)
writedlm("Data.csv", B)
end
```

With this code I experience a massive overhead. For nworkers <= 3 (that coincides with the size of the distributed for loop, N, so we can regard nworkers = N) everything works as expected, each â€śData.csvâ€ť file is produced in around 5 min each.

However for nworkers > 3 it takes around 25 min to produce one file. This time is independent of nworkers after nworkers > 3, meaning that the time that it takes to write a file for N = nworkers = 5 is the same as for N = nworkers = 32.

I have no idea what could be going on. One of my hypothesis was that the input sparse matrix A was being copied N times, however my last solution attempt was to declare it as a distributed matrix, this didnâ€™t solve the problem.

If someone has some insight or comments about this problem Iâ€™d be extremely grateful.