How to setup Julia workers to use specific number of cores?

indycdrom · January 20, 2020, 7:31pm

I am a beginner, so I am trying my best to describe my issue as clearly as possible.

I installed julia 1.2.0 in a windows 2012 with 2 CPU/socket (22 cores in each CPU) and 512GB RAM.
I am wondering whether I can start Julia workers to fully utilize the cores and config each worker with specific number of cores. I have the following code to start multiple workers

using Distributed

x= 44
addprocs(x+1-length(procs()))
then use @everywhere to load codes into the workers
then use remotecall to run the program in each worker
for i=1:nworkers() ; remotecall(func,workers()[i]) ; end

my observation was that if I do not use distributed, main process uses all my physical cores (50% of my logical cores, so OS task manager shows 50% usage), but when I use distributed, each worker only uses 1 core and there is no way to specify number of cores for each worker. it is very beneficial to me if I can do that since my worker processes different data so resource requirement is different.

thanks!

tro3 · January 20, 2020, 8:18pm

@indycdrom - I think you are describing the difference between (I am oversimplifying) Distributed (across CPUs) and Threads (across Cores). If you start up Julia with multiple threads (see JULIA_NUM_THREADS in the docs), you should be able to get multiple Cores running on each of your workers.

ImreSamu · January 20, 2020, 9:01pm

Why not upgrade from 1.2 to Julia 1.3 ?
So you can use “composable multi-threaded parallelism”!
see more: Announcing composable multi-threaded parallelism in Julia

tro3 · January 20, 2020, 9:08pm

Oh, geez - I missed that. I assumed he was already on 1.3

indycdrom · January 20, 2020, 11:56pm

thanks for the response. I will give 1.3 a try.

as I said, I am a beginner so I wish someone can provide a little more detail or sample codes which can really save my day.

here is my task: Suppose I need to continuously and simultaneously process some data for five different cities and data comes in real time , but not at the same time. so I designed a simple process to detect the availability of the data and process the data if available and start over again.

using Distributed
x = 5
addprocs(x+1-length(procs()))
@everywhere using Pkg;
@everywhere using Pkg.activate(“./julia/myenv”)
@everywhere using …

@everywhere function isavailable(cityname::String)::Bool
…
end

@everywhere function func(cityname::String)
while true
if isavailable(cityname)
println(“processing started”)
processdata(cityname)
println(“processing completed”)
end
sleep(5)
end
end

for i=1:nworkers()
remotecall(func,workers()[i], cityname[i]) ;
end

as you can see, my code is just plain Julia code with no macro and other advanced features. I can run this simple code with no issue, but the issues that Julia does not use up all my cores (based on the CPU usage and measured time to complete the data processing). To be specific, if I run it in the main process with no distributed feature, processdata only takes 3-4mins, but once distributed, it take 30mins to complete even only one worker is running . so clearly each worker is limited to small set of cores (i think it is just one because task manager shows only one core being busy)

I always assumed that under multiple processor setting, Julia will optimize the CPU resources for each worker which means if I have only one worker, it should have all the resources, but once I start multiple workers, they will share the resources based on some type of load control (since I have no way to specify number of cores for each worker).

I am not sure how to use multi-threads feature in my task. based on my sample code, any suggestion will be greatly appreciated.

ImreSamu · January 21, 2020, 11:05am

my suggestions:

please update to the latest version now: Julia 1.3.1 ( imho: it is important! Lot of bugfix, improved multi-processing )
check the docs Parallel Computing · The Julia Language
- important: start the julia with julia -p 22
  " Starting with julia -p n provides n worker processes on the local machine. Generally it makes sense for n to equal the number of CPU threads (logical cores) on the machine. Note that the -p argument implicitly loads module Distributed ."
- add " export JULIA_NUM_THREADS=22 " ( you have to find the similar windows command )
- " By default, Julia starts up with a single thread of execution. This can be verified by using the command Threads.nthreads() :"
please create a minimal working example code we can re-run … examine / improve …
- imho: your draft example is not perfect.
please use code formatting ( “</> Preformatted text” in the menu, CTRL+SHIFT+C )

As I know the full “multi-threading” is work in progress …

Announcing composable multi-threaded parallelism in Julia see " Looking forward"
- for example this is on the TODO list:
  - “Adding parallelism to the standard library. Many common operations like sorting and array broadcasting could now use multiple threads internally.”

Topic		Replies	Views
Choosing the number of worker processes General Usage distributed	6	1547	September 27, 2021
Combining distributed computing / multithreading Julia at Scale multithreading	7	2694	March 7, 2020
Multiple Worker Processes with Different Numbers of Threads General Usage distributed	2	108	April 2, 2024
Julia distributed and multithreaded Performance question	14	2030	October 13, 2022
Parallel Good Practice Julia at Scale	22	3900	November 30, 2018

How to setup Julia workers to use specific number of cores?

here is my task: Suppose I need to continuously and simultaneously process some data for five different cities and data comes in real time , but not at the same time. so I designed a simple process to detect the availability of the data and process the data if available and start over again.

for i=1:nworkers() remotecall(func,workers()[i], cityname[i]) ; end

Related topics

for i=1:nworkers()
remotecall(func,workers()[i], cityname[i]) ;
end