Is the best number of threads used in parallel computing by using distribute 4?

MTone · May 25, 2020, 4:01pm

I am doing a numerical calculation using Julia, and recently I just tested the speed of my code using parallel computing by applying @distributed in the summation:
Julia1
(It is just part of my code. The attached part is enclosed in another loop so Sm is a local variable and can be used in \eta loop directly. I am sure my code is correct and gives correct results.)
Then I collected the running time for using different number of threads (1, 4, 8, 16 and 32):
(9:52, 7:14, 6:54, 6:50 and 7:34 in terms of “HH:MM”, respectively)
It looks for me that when I just from 1 thread to 4 threads the improvement is best. With the threads number increasing, the performance doesn’t increase that much and for using 32 threads, it even took longer time than using 4 threads.
I am sure that I allocated enough number of threads when I running this code on my cluster. I used “export JULIA_NUM_THREADS=Value” (Value = 1, 4, 8, 16, 32) in my scripts to ensure that Julia will start with this many threads. I also used:
"N_t = nthreads()
println("Number of Threads = “,N_t)”
in my code to check and confirm that I did allocate this much threads.
So my question is:

Does anyone else test or feel that the parallel computing by @distributed in Julia is not linear related to the number of threads? Is my case special or common?
If my case is a common case, is there any possible explanation on why the running time is not 1/4 with 4 times of threads and why when using 4 threads it seems to get the best improvement?
I am very new to Julia and even for programming. May I know if another languages have similar performance in parallel computing?

Any answer is welcome. Thanks for your reading and replying

tbeason · May 25, 2020, 4:11pm

Linear scaling is the best possible outcome (not really, it is possible to get superlinear scaling). It is rarely observed. The speedup you will see depends on your architecture and the nature of your program. There are not really any universal, clear-cut truths to parallel programming that I know of.

MTone · May 25, 2020, 4:34pm

Thank you tbeason! I used to think that linear scaling is universal.

lungben · May 25, 2020, 7:35pm

@distributed is for multiprocessing, whereas the JULIA_NUM_THREADS environment variable is for multithreading.
For multiprocessing with @distributed, use addprocs() to add worker processes.
Alternatively, use Threads.@threads or Threads.@spawn for multithreading.

MTone · June 11, 2020, 4:56pm

Thank you lungben! I rewrite my code with addprocs() to add processors and my code become much faster. I just noticed that previously, I ran my code on only one processor.
Also, I realized that there are 2 ways of parallel computing in Julia. One is Threads.@threads or Threads.@spawn used with $env:JULIA_NUM_THREADS = <nthreads> which with shared memory. Another one is Distributed@distributed with addprocs() for multiprocessing which will distribute the job into several processors who has their only memories.

Topic		Replies	Views
Recommended env variables for most performant parallel/multithreading? Performance question	3	1268	June 25, 2019
Combining distributed computing / multithreading Julia at Scale multithreading	7	2766	March 7, 2020
Overhead of Distributed module vs. Threads New to Julia parallel	1	820	March 7, 2019
Looking for tips on parallelism for Differential Equations Problems New to Julia diffeq , parallel , multithreading	1	1076	May 22, 2020
Questions about getting started with parallel computing Julia at Scale	18	5859	June 22, 2019

Is the best number of threads used in parallel computing by using distribute 4?

Related topics