Distributed performance depends on the number of workers?

greivin · June 11, 2024, 9:01pm

I’m executing a code that solves a linear system (Ax = b) for a fixed matrix A, and for several vectors b_i here i runs from 1 to N. I set N to match the number of available cores in my system, therefore I’ll be solving N = 32 (for example) linear systems in parallel.

I do this by putting @everywhere in front of every required function, defining the matrix of solutions as a SharedArray (that contains every vector solution (x1 x2 … x32)) and adding the tags @sync @distributed in front of the for that runs over the i=1:N(=32) vectors b_i.

However, when sent to the cluster using slurm, I noticed that the running time is larger than expected compared to a single run in my personal machine. In my personal machine all the code (for one vector b1) lasts ~500 s while in the cluster using N = 16 (and hence 16 workers with @sync @distributed) it is taking longer than 2700 s. Worst of all, when attempting to send more jobs than available workers (32 vectors b_i with @sync @distributed over 16 workers) for ~2700 s it was already done.

What could be going on? Is there a way to avoid this overhead and produce faster results?

Topic		Replies	Views
@distributed fails for many workers? Julia at Scale	21	1655	May 14, 2019
Slurm cpus vs tasks in Distributed Performance question , distributed	0	479	November 18, 2020
Distributed computing over SLURM array Performance slurm	11	260	September 20, 2024
Why does increasing the number of identical parallel operations increases each operation time? Performance linearalgebra , distributed	6	1010	March 12, 2021
Using Distributed: computational efficiency Julia at Scale	5	950	June 24, 2019

Distributed performance depends on the number of workers?

Related topics