Attaching workers to cores

nishaChandramoorthy · July 22, 2020, 3:02pm

Hi,

I am trying to run a program using Distributed. The workflow is to have
each remote worker execute the same function (with different arguments)
that is called using remotecall.
The problem is that initially all workers seem to be running on different cores
but suddenly, all the execution switches to using a single core. Could you please
help me keep the different workers on different logical cores?

Here is what I am doing:

using Distributed
addprocs(15)
@everywhere include("rijke_tangent_state_estimation.jl")
f = Array{Future,1}(undef, nworkers())
for i = 1:nworkers()
        filename = string("../data/rijke_exp3_", string(i), ".jld") 
        put!(RemoteChannel(i), filename) 
        @async f[i] = remotecall(assimilate_parameter_and_trajectory, i, filename)
        println("sent remote call to ", i)
end
for i = 1:nworkers()
        println(fetch(f[i]))
        println(myid(), "is done..")
end

affans · July 22, 2020, 4:56pm

How quick does your function run? Does using pmap instead make more sense?

nishaChandramoorthy · July 22, 2020, 6:13pm

It’s an expensive function, each evaluation takes about 3 hours on a single core i7 1.8GHz CPU.
The real pressing question is why does the above code switch to a single core evaluation, given that, when it started, it was using up as many cores as workers?
Thanks for letting me know about pmap! I can try it and let you know.
Does pmap have a different behavior for parallelism?

affans · July 22, 2020, 6:24pm

I don’t know what the internals are like for pmap, but the idea is that it automatically takes care of the scheduling for the long running functions.

nishaChandramoorthy · July 22, 2020, 6:27pm

Thanks, I just tried pmap and it ends up doing the same thing.
For the first few seconds, 16 cores are being used, and afterward, only one is being used.

affans · July 22, 2020, 6:29pm

It could be that your functions are returning quicker than you thought. Are you sure the function takes about 3 hours? Can you post more code and also how you called pmap? Thanks

nishaChandramoorthy · July 22, 2020, 6:56pm

Thanks for responding!

I am sure because I tried the function serially and it works perfectly fine.
It takes as long as I mentioned.
The problem is definitely in how I am parallelizing. Here is how I use pmap:

using Distributed
addprocs(15)
@everywhere include("rijke_tangent_state_estimation.jl")
filenames = Array{String,1}(undef, nworkers())
for i = 1:nworkers()
        filenames[i] = string("../data/rijke_exp3_", string(i), ".jld")
end
pmap(assimilate_parameter_and_trajectory, filenames)

nishaChandramoorthy · July 22, 2020, 7:03pm

OK I noticed that with pmap or using remotecall as in my original post, the code stops with a segmentation fault after a while. This is due to an out-of-memory exception which should not occur because the function does allocate memory less than the memory per core of my cluster node.
However, if all 16 workers run on the same core, which is what was happening, I would not be surprised if there is a seg fault due to an out of memory error.

affans · July 22, 2020, 7:06pm

Yes, that was my next question. Make sure there is enough memory for each julia worker process running the code. Furthermore, all the results from pmap are returned back to the head node (or the main core it was run on) so you must make sure there is available memory there as well to collect the results.

For example, my head node where I launch Julia and submit my pmap code only has 32 gb of memory. My compute nodes have 128gb of memory. So I have to make sure that when my workers still the objects back to pmap, it’s less than 32 gb in memory.

nishaChandramoorthy · July 22, 2020, 7:27pm

Thank you so much @affans!
This was super useful:

I removed the return values of the function (just wrote the arrays to disk)
and this seemed to fix everything. I think returning all the arrays at once made the main core go out of RAM. Thanks for your help!

johnh · July 22, 2020, 8:09pm

In HPC the normal method to pin processes is to use the numactl tools (linux specific).
The slurm scheduler can do this.
This actually leads me to ask how is process pinning accomplished in Julia? You do ask a good question!

As you say it is important. Also if processes switch between cores it can cause cache invalidation.

Topic		Replies	Views
Pmap use of processor cores Julia at Scale question , pmap , load-balancing	13	2239	June 12, 2019
Behavior of worker pool in pmap Performance pmap	2	915	November 25, 2018
Distributed nested in Pmap Julia at Scale	2	1154	February 22, 2019
Problems using pmap(), and doubt about the number of workers/processes to use General Usage pmap	3	1180	February 7, 2019
Parallel Good Practice Julia at Scale	22	4044	November 30, 2018

Attaching workers to cores

Related topics