Pmap use of processor cores

abx · June 10, 2019, 4:26pm

I have a simple question about pmap and I can’t find a clear answer anywhere.

When I have a series of pmap’d processes and let’s say, few of them are very heavy and require most of the resources, are the heavy tasks able to use all cores/memory they need or they are going to have any limitation because of the pmap (e.g. one core for process)?

Thanks in advance

MaximilianJHuber · June 10, 2019, 5:53pm

I do not think that there are any “limitations” imposed by pmap. If you have 2 workers and want to apply an expensive function on 10 elements of an array, each of the 2 workers may execute multi-threaded BLAS calls, custom codes, etc. while both using a lot of memory.

But if both workers simultaneously work on one of the “heavy” tasks and need lets say 6GB RAM each but your machine only has 8GB, you will run into trouble.

If you can explain your specific application more (with a working example), maybe we can advice you on what kind of parallel computing is the best fit.

abx · June 10, 2019, 7:41pm

@MaximilianJHuber thanks so much for the answer!

What I was seeing was the pmap operation not being able to make use of all the computing power of my machine. Since I have a mix of very light operations and some very heavy ones, I was wondering if there was a limitation of core usages by pmap processes.
If that is not the case then I have probably to look for bottlenecks inside the pmap’d operations, I would guess…

jpsamaroo · June 10, 2019, 9:37pm

I don’t think pmap has the “smarts” you’re looking for; it’s (probably) just a simple map where each element is assigned to a Julia process at scheduling time (i.e. not dynamically allocated when other iterations complete and balanced based on load). EDIT: This is wrong, but the following message still stands

If you were to implement such functionality, ~~it would likely be considered out of scope of pmap; however,~~ there are many people who I’m sure would appreciate such functionality (myself included). If you have some ideas for how to implement this, Dagger.jl would be a great place to discuss the design/implementation, as Dagger already has the necessary foundation to support intelligent reallocation of work at runtime.

under-Peter · June 10, 2019, 9:49pm

Pmap has a keyword batchsize whose default value is one, see Distributed Computing · The Julia Language . This means that work is allocated dynamically: if a worker is done with its task, it’ll try and get a new one until all are completed.

But iirc each worker is limited to one core so if you want to use more cores, initiallize more workers or increase threads in blas or similar. That might be wrong though.

abx · June 11, 2019, 1:40pm

@under-Peter
Thanks, I will look into the batchsize.

But iirc each worker is limited to one core

This is actually what I am trying to get to the bottom of… Am I victim of a Mandela effect?

abx · June 11, 2019, 1:42pm

@jpsamaroo
I am actually not looking for a smarted pmap, just to have a worker to be able to get all the resources needed to perform a task - and for my understanding this should be the expected behavior…

jpsamaroo · June 11, 2019, 2:42pm

Unfortunately, scheduling arbitrary work that takes full use of all available CPUs while also being efficient is not an easy problem to solve, even for synthetic tasks.

If you can provide some details on what you’re trying to do (an MWE plus any data that accompanies it), then there’s the chance we can help you either tune pmap for your problem, restructure your problem to be more amenable to efficient pmap execution, or find a better work scheduling algorithm to better suit your needs.

MaximilianJHuber · June 11, 2019, 3:03pm

That is the default configuration, but addprocs takes the argument enable_threaded_blas which makes the worker execute linear algebra in a a multi-threaded way. If say you want to run an @threads for loop, make sure the JULIA_NUM_THREADS environment variable is correctly set before you create the workers, because they inherit that configuration.

abx · June 11, 2019, 3:31pm

That is interesting. Why isn’t enable_threaded_blas true by default?

jpsamaroo · June 11, 2019, 6:08pm

Because using multithreaded BLAS with multiple Julia processes on the same host simultaneously would cause a drop in computational efficiency for each running BLAS operation. Many people launch a worker for each core on their host.

abx · June 12, 2019, 1:12am

does JULIA_NUM_THREADS play any role within a pmap?

jpsamaroo · June 12, 2019, 5:41am

No, that only changes the number of non-BLAS Julia threads, which I assume you are not using.

MaximilianJHuber · June 12, 2019, 3:06pm

@abx let me reiterate: if you show us some code, some MWE, or at least describe what your heavy function does, we can give you well-informed advice.

Topic		Replies	Views
Using pmap: Do all workers have to be equally capable? Julia at Scale parallel , distributed , pmap	5	887	April 29, 2022
Behavior of worker pool in pmap Performance pmap	2	900	November 25, 2018
Pmap using fewer workers than expected after some time Julia at Scale pmap	5	562	February 10, 2021
Attaching workers to cores General Usage distributed	10	484	July 22, 2020
Problems using pmap(), and doubt about the number of workers/processes to use General Usage pmap	3	1159	February 7, 2019

Pmap use of processor cores

Related topics