Huge distributed overhead

greivin · June 18, 2024, 3:54pm

My code looks something like this. First, I have a file where I define my functions

#Functions.jl file
#Here we import the relevant packages and define all the functions with @everywere 

@everywhere Solver(A, N) 
     B = SharedArray{Float64}(N) 
     @sync @distributed for i = 1:N 
         ...#Some operations over A dependent on i. We save the results of these operations in B
end
return B
end

Then we have a new file for calling the functions

include("Functions.jl") 

for j=1:M #sequential for loop
     A = ...#generates sparse array A. This array is very large.
     B = Solver(A, N) 
     writedlm("Data.csv", B) 
end

With this code I experience a massive overhead. For nworkers <= 3 (that coincides with the size of the distributed for loop, N, so we can regard nworkers = N) everything works as expected, each “Data.csv” file is produced in around 5 min each.
However for nworkers > 3 it takes around 25 min to produce one file. This time is independent of nworkers after nworkers > 3, meaning that the time that it takes to write a file for N = nworkers = 5 is the same as for N = nworkers = 32.

I have no idea what could be going on. One of my hypothesis was that the input sparse matrix A was being copied N times, however my last solution attempt was to declare it as a distributed matrix, this didn’t solve the problem.

If someone has some insight or comments about this problem I’d be extremely grateful.

Oscar_Smith · June 18, 2024, 4:32pm

In general using mulitthreading will have much lower overhead than distributed because you can then share data

greivin · June 19, 2024, 11:46am

I tried to do so, however it is not working properly.

When I use distributed write ps auxr on the terminal of the respective PC, I can see the different workers (different PID) being executed. However this is not the case with multi-threating. When I run julia -t 4 I just see a single PID instead of the 4 processes that I expect. Is this expected?

Topic		Replies	Views
Using Distributed: computational efficiency Julia at Scale	5	950	June 24, 2019
Parallel and distributed very slow New to Julia parallel , benchmark , distributed , threads	6	1036	October 14, 2021
Distributed Performance Degradation New to Julia distributed	5	163	May 23, 2025
Distributed performance depends on the number of workers? General Usage package , parallel , cluster , distributed , slurm	0	87	June 11, 2024
Why might I be seeing a large overhead for multiprocessing? Performance	8	2351	October 1, 2020

Huge distributed overhead

Related topics