Correct way of parallelizing on a HPC remote cluster machine

trathi · August 21, 2020, 1:45am

I want to know what is the correct way of running a julia script (which has some part that requires parallelization) on a HPC cluster remote machine.
My PBS script looks like:

#!/bin/bash -l
#PBS -l walltime=01:00:00,nodes=1:ppn=24,mem=62gb
export JULIA_NUM_THREADS=24
julia test.jl

and test.jl looks like:

using Distributed
addprocs(24)

@everywhere function sqr(x)
    return x^2
end

y = pmap(x->sqr(x), 1:1e5)

I am using 1 node with 24 cores as is depicted in the PBS script above.
My question is do I need to declare export JULIA_NUM_THREADS=24 (since the default threads is 1 as is given by Threads.nthreads() ) in the PBS script or the specification ppn=24 is sufficient?
I tried with and without export JULIA_NUM_THREADS=24 and didn’t notice any difference in performance of my actual code in terms of execution time. I am not sure if my code is getting properly parallelized or not.
Any suggestions on the correct way to specify threads and processes in PBS and/or julia script when using pmap for parallelization.

I am using Julia 1.5.0 .

jishnub · August 24, 2020, 1:50pm

You’re using distributed memory parallelism, so you do not need threads. pmap will automatically run the function on free workers. However for such a small function you’ll not find any advantage by using more workers, as communication time will swamp gains from parallel execution.

By the way since you’re using PBS, consider looking into ClusterManagers.jl if you want to use more than one node.

dlakelan · August 25, 2020, 1:31am

you might actually want to use 24 threads on each node. but then you’ll have to call some threading operations…

like within your real function you might do @threads to parallelize across threads

trathi · August 25, 2020, 2:01am

Thank you @jishnub. Yes, for now I simply request 1 node and a certain number of cores on that node through PBS script. Then, I simply load processes using addprocs in my julia code. This seems to be working well for now. Also, yes for this small function sqr this will not be much helpful, but in my actual code I have another function that does large computation.

Regarding using multiple nodes, I do plan to use more than one node, but since I am not sure how pmap actually behaves when the cores are present on different nodes, thus so far I have been avoiding it. As you said, I will probably look into ClusterManagers.jl for this.

trathi · August 25, 2020, 2:10am

Hi Daniel,

I guess what you have mentioned is a different method of parallelization using @threads macro. This falls under multi-threading and not distributed computing as far as I know (please correct me if I am wrong). And for using @threads macro, I indeed need to specify JULIA_NUM_THREADS in my before script as pointed by @jishnub. Right?

Additionally, I have tried using @threads macro but couldn’t get better parallelization as I do with pmap for some reason, so I have been sticking to pmap for now.
Is multi-threading preferred over distributed processing (or the other way round) in certain scenarios?

dlakelan · August 25, 2020, 2:40am

pmap will map your problem across multiple machines… Within that machine you can map your problem across multiple threads. They complement each other.

trathi · August 25, 2020, 3:00am

Okay. I get what you’re saying but I am not sure how will I achieve this in an actual code. For example I requested 2 nodes with 24 cores each and then I added 48 threads using JULIA_NUM_THREADS=48. Also, I loaded some processes using addprocs. Now, how will I use pmap and @threads together. Should I just write them both in front of my function that needs to be parallelized. Can you please give a simple example of doing so?

dlakelan · August 25, 2020, 3:04am

first since your nodes have 24 CPUs you should make the NUM_THREADS be close to 24, like at most maybe 26 (sometimes it can help to have some oversubscription)

then, how to use?

pmap will map across machines… so write the function that you pmap to use @threads, and then each machine will use multiple threads.

trathi · August 25, 2020, 3:08am

Okay. I will probably try this out on some of my code and see how it works. Thank you!

Topic		Replies	Views
Combining distributed computing / multithreading Julia at Scale multithreading	7	2726	March 7, 2020
Multithreading and pmap Julia at Scale	8	2738	January 5, 2019
Multithreading on a Linux HPC cluster General Usage question , hpc , multithreading , cluster	0	642	July 27, 2020
Multi-threaded worker processes General Usage	14	4484	November 9, 2023
How to parallel Julia on multiple nodes on HPC (slurm)? Julia at Scale question	11	3590	May 20, 2020

Correct way of parallelizing on a HPC remote cluster machine

Related topics