Multithreading and pmap

gideonsimpson · January 4, 2019, 8:29pm

I have access to a new machine that has 2 x 20 core Xeon Gold CPUs which support multithreading. I was playing around with some simple pmap problems, and I was not seeing a performance jump when I tried going from 40 to 80 workers (hoping to benefit from the multithreading). Does anyone have any suggestions on how to best leverage my computing environment with Julia for embarassingly parallel (i.e., pmap) type problems?

Sukera · January 4, 2019, 9:02pm

Have you set the environment variable JULIA_NUM_THREADS or started julia with julia -p 40 to start julia with 40 worker processes?

tkoolen · January 4, 2019, 9:30pm

JULIA_NUM_THREADS doesn’t apply to pmap; that comes into play with Threads.@threads for.

@gideonsimpson, why would you expect performance to improve if you go from 40 to 80 workers if you only have 40 logical cores?

Sukera · January 4, 2019, 9:34pm

I’m aware - but it may be that the code from @gideonsimpson uses that macro in some place and they expected an increase in performance. I don’t know, as they didn’t specify in the first post

Mason · January 4, 2019, 10:27pm

I think he meant that he has 40 physical cores which support hyperthreading, ie. 80 threads.

tkoolen · January 4, 2019, 10:40pm

Even if that’s the case, I actually wouldn’t expect performance to improve if you increase the number of workers past the number of physical cores.

gideonsimpson · January 4, 2019, 10:48pm

I use addprocs at the beginning of my script, which I call with just julia script1.jl

kolia · January 5, 2019, 12:50am

To be clear, pmap distributes computation across processes which do not share memory; objects get serialized to get sent between processes, and the processes can run on remote machines. Maybe things aren’t speeding up when you add processes because of this serialization overhead?

Multi-threading on the other hand is shared-memory so does not incur that overhead. It is most easily used with the @threads macro, not pmap, and all threads are on the same machine. Multi-threading is experimental, but mostly works unless you’re doing IO on the threads. So you could try the @threads macro instead, but then you’d need to launch Julia in an environment where JULIA_NUM_THREADS is set for julia’s Threads to run on multiple cpu cores, otherwise all your @threads will do is run a bunch of tasks on the same core.

Elrod · January 5, 2019, 1:50am

If he goes the threading route, I’d recommend taking a look at KissThreading.jl. Among other things, it offers a tmap! function and initializes a vector of Mersenne Twisters named TRNG that you can use if any of the code generates random numbers.
pmap and tmap! are better than @distributed for and @threads for when the functions being called take a while, and there’s some variance in that run time.
The former use dynamic scheduling, and the latter static scheduling.

I’ve normally tried using threads before distributed. However, I normally get poor scaling. Much worse than OpenMP. It’s probably my fault.

Topic		Replies	Views
When to use pmap vs Threads.@threads? Julia at Scale jump , parallel , multithreading , pmap	11	3403	December 16, 2021
Is Pmap _both_ distributed and threaded? Performance multithreading , distributed , pmap	4	456	December 28, 2021
Lack of improvement from distributed pmap, understanding a simple example New to Julia distributed , pmap	6	157	October 29, 2024
Correct way of parallelizing on a HPC remote cluster machine Performance question , hpc , parallel , distributed , threads	8	1284	August 25, 2020
Local Parallel Processing Benchmark Advice Julia at Scale	2	783	April 26, 2018

Multithreading and pmap

Related topics