Hello,
Based on my computer’s performance, what is the maximum number I can set in Julia for:
1- Multiprocessors: is it addprocs(12)?
2- Multithreading: is it julia_num_threads=12?
Thanks,
Thanks,
Your hardware’s going to run max. 12 threads at a time, possibly in one program or several. It is going to run other programs in addition to Julia as well. My thinking would be that less than 12 would be optimal, but you should benchmark your code.
You can check Sys.CPU_THREADS
.
Note that it isn’t type stable:
julia> systhreads() = Sys.CPU_THREADS
systhreads (generic function with 1 method)
julia> @code_warntype systhreads()
Variables
#self#::Core.Const(systhreads)
Body::Any
1 ─ %1 = Base.Sys.CPU_THREADS::Any
└── return %1
and also that whether it’s better to use the number of physical cores vs logical threads varies by application.
It’s also more complicated on CPUs with a mix of big and little cores. On the M1, which has 4 big and 4 little cores, I find much better performance when using 4 threads than with 8.
Thanks for your reply!
Thanks for your reply!
I have the same of your output. Does this mean I have only one thread?
No Chris’s point was just that the function isn’t type stable, which is what @code_warntype
shows you. To get the number of threads, you just want to check the variable itself:
julia> Sys.CPU_THREADS
4
Incidentally, why is that? I would have expected that it is always an Int
.
I think because it is a non-const global, as it has to be initialized in the init
block:
https://github.com/JuliaLang/julia/blob/93d375cb021379a7a57aad05df86c5925cfaebcf/base/sysinfo.jl#L99
On the subject of getting more detailed information, Hwloc.jl
provides some:
julia> Sys.CPU_THREADS
36
julia> Hwloc.num_virtual_cores()
36
julia> Hwloc.num_physical_cores()
18
julia> Hwloc.topology()
Machine (125.48 GB)
Package L#0 P#0 (125.48 GB)
NUMANode (125.48 GB)
L3 (24.75 MB)
L2 (1.0 MB) + L1 (32.0 kB) + Core L#0 P#0
PU L#0 P#0
PU L#1 P#18
L2 (1.0 MB) + L1 (32.0 kB) + Core L#1 P#1
PU L#2 P#1
PU L#3 P#19
L2 (1.0 MB) + L1 (32.0 kB) + Core L#2 P#2
PU L#4 P#2
PU L#5 P#20
L2 (1.0 MB) + L1 (32.0 kB) + Core L#3 P#3
PU L#6 P#3
PU L#7 P#21
L2 (1.0 MB) + L1 (32.0 kB) + Core L#4 P#4
PU L#8 P#4
PU L#9 P#22
L2 (1.0 MB) + L1 (32.0 kB) + Core L#5 P#8
PU L#10 P#5
PU L#11 P#23
L2 (1.0 MB) + L1 (32.0 kB) + Core L#6 P#9
PU L#12 P#6
PU L#13 P#24
L2 (1.0 MB) + L1 (32.0 kB) + Core L#7 P#10
PU L#14 P#7
PU L#15 P#25
L2 (1.0 MB) + L1 (32.0 kB) + Core L#8 P#11
PU L#16 P#8
PU L#17 P#26
L2 (1.0 MB) + L1 (32.0 kB) + Core L#9 P#16
PU L#18 P#9
PU L#19 P#27
L2 (1.0 MB) + L1 (32.0 kB) + Core L#10 P#17
PU L#20 P#10
PU L#21 P#28
L2 (1.0 MB) + L1 (32.0 kB) + Core L#11 P#18
PU L#22 P#11
PU L#23 P#29
L2 (1.0 MB) + L1 (32.0 kB) + Core L#12 P#19
PU L#24 P#12
PU L#25 P#30
L2 (1.0 MB) + L1 (32.0 kB) + Core L#13 P#20
PU L#26 P#13
PU L#27 P#31
L2 (1.0 MB) + L1 (32.0 kB) + Core L#14 P#24
PU L#28 P#14
PU L#29 P#32
L2 (1.0 MB) + L1 (32.0 kB) + Core L#15 P#25
PU L#30 P#15
PU L#31 P#33
L2 (1.0 MB) + L1 (32.0 kB) + Core L#16 P#26
PU L#32 P#16
PU L#33 P#34
L2 (1.0 MB) + L1 (32.0 kB) + Core L#17 P#27
PU L#34 P#17
PU L#35 P#35
But it’s no help telling big vs small cores (AFAIK):
julia> versioninfo()
Julia Version 1.8.0-DEV.92
Commit d1145d4569* (2021-06-29 01:41 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin20.5.0)
CPU: Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.0 (ORCJIT, cyclone)
Environment:
JULIA_NUM_THREADS = 4
julia> Sys.CPU_THREADS
8
julia> Hwloc.num_virtual_cores()
8
julia> Hwloc.num_physical_cores()
8
julia> Hwloc.topology()
Machine (3.41 GB)
Package L#0 P#0 (3.41 GB)
NUMANode (3.41 GB)
L2 (4.0 MB) + L1 (64.0 kB) + Core L#0 P#0
PU L#0 P#0
L2 (4.0 MB) + L1 (64.0 kB) + Core L#1 P#1
PU L#1 P#1
L2 (4.0 MB) + L1 (64.0 kB) + Core L#2 P#2
PU L#2 P#2
L2 (4.0 MB) + L1 (64.0 kB) + Core L#3 P#3
PU L#3 P#3
L2 (4.0 MB) + L1 (64.0 kB) + Core L#4 P#4
PU L#4 P#4
L2 (4.0 MB) + L1 (64.0 kB) + Core L#5 P#5
PU L#5 P#5
L2 (4.0 MB) + L1 (64.0 kB) + Core L#6 P#6
PU L#6 P#6
L2 (4.0 MB) + L1 (64.0 kB) + Core L#7 P#7
PU L#7 P#7
Thank you very much all! Now, I know the maximum available number of threads in my computer.
Can I consider the number returned by (Sys.CPU_THREADS) as the maximum processes that I can define in Julia as well, i.e. addprocs(11)+Master node=12 processes in total?
In other words, does the concept of threads is similar to process workers in Julia?
addprocs
defaults to Sys.CPU_THREADS
.
Distrubted normally runs code on the worker processes, so you’d want addprocs(12)
for 12 workers.
You can add however many workers you want (memory allowing), but you’ll probably get the best performance with 6 or 12.
The processor you have will only ever run 6 things at once. The threads are for “hyperthreading” which may occasionally allow your CPU to switch rapidly between running one thing on a core and running another on a core. This can reduce context switching time and allow the CPU to be utilized somewhat more efficiently but it only usually is a benefit when you have a lot of cache misses or other stalls.
For efficient numerical code the hyperthreading rarely helps much and can even hurt. So you should try both 6 and 12 and see what goes faster for your workload.
So, setting a proper number of threads in Julia should be based on the number of logical processors in the computer, right? in my case 12 as I have
julia> Sys.CPU_THREADS
12
The above is also true for number of worker processes, right?
Is this because processes need to have their own memory partitions, so higher number of processes will lead to higher memory occupation which should not exceed its capacity, right?
No it should be based on what you want to accomplish. For example a friend was running some MCMC procedures while editing his manuscript. He had 6 cores and 12 threads. I advised him to run 4 chains on 4 threads so that he had two real cores still available for interaction while editing the manuscript.
If he had run 12 threads his machine would have been unusable for editing. Even if he’d run 6 threads it would have been no interactive because of the 12 hyperthreads only 6 can run at any one time.
Is this means that the 6 threads are running on 6 cores (one thread in each core), thus there are no available core for editing (in your example)?
Yes more or less, and if you make 12 threads they still only have 6 running at any time
Since __init__
is run only once after module loading, then eval
might be OK?
module Test
export test, test2
function __init__()
val = rand(1:10)
global TEST = val
@eval const global TEST2 = $val
end
test() = TEST
test2() = TEST2
end
What do you get when calling hwloc directly from the CMD? The big/small core information might be available in the objects properties and we’re just not printing it. Maybe worth trying to extract a core from the topology and looking at its fields, i.e. something like collectobjects(:Core, gettopology())[1].attr
. (For me the output is empty though.)
Is there a heuristic for which kind of application I have, or is it unpredictable?
Apparently, since hwloc Version 2.4 lstopo
seems to have a --cpukinds
option. And in 2.6 they have specifically worked on distinguishing high and low performance cores for M1 mac’s, see hwloc/NEWS at master · open-mpi/hwloc · GitHub . I’ll take a look and will try to update Hwloc.jl accordingly tomorrow.
(Update: https://github.com/JuliaParallel/Hwloc.jl/issues/57)