How to set up number of threads appropriately based on Hardware?

Amro · June 30, 2021, 8:20pm

Hello,
Based on my computer’s performance, what is the maximum number I can set in Julia for:
1- Multiprocessors: is it addprocs(12)?
2- Multithreading: is it julia_num_threads=12?

Thanks,

StatisticalMouse · June 30, 2021, 8:34pm

Your hardware’s going to run max. 12 threads at a time, possibly in one program or several. It is going to run other programs in addition to Julia as well. My thinking would be that less than 12 would be optimal, but you should benchmark your code.

Elrod · June 30, 2021, 8:37pm

You can check Sys.CPU_THREADS.
Note that it isn’t type stable:

julia> systhreads() = Sys.CPU_THREADS
systhreads (generic function with 1 method)

julia> @code_warntype systhreads()
Variables
  #self#::Core.Const(systhreads)

Body::Any
1 ─ %1 = Base.Sys.CPU_THREADS::Any
└──      return %1

and also that whether it’s better to use the number of physical cores vs logical threads varies by application.
It’s also more complicated on CPUs with a mix of big and little cores. On the M1, which has 4 big and 4 little cores, I find much better performance when using 4 threads than with 8.

Amro · June 30, 2021, 8:42pm

Thanks for your reply!

So, basically the number of threads (julia_num_threads) relies on the number of logical processors, right?
How is about the number of processes? it it also relies on the number of logical processors (i.e., addprocs())?

Amro · June 30, 2021, 8:44pm

Thanks for your reply!
I have the same of your output. Does this mean I have only one thread?

Elrod:

julia> systhreads() = Sys.CPU_THREADS
systhreads (generic function with 1 method)

julia> @code_warntype systhreads()
Variables
  #self#::Core.Const(systhreads)

Body::Any
1 ─ %1 = Base.Sys.CPU_THREADS::Any
└──      return %1

nilshg · July 1, 2021, 6:52am

No Chris’s point was just that the function isn’t type stable, which is what @code_warntype shows you. To get the number of threads, you just want to check the variable itself:

julia> Sys.CPU_THREADS
4

Tamas_Papp · July 1, 2021, 11:10am

Incidentally, why is that? I would have expected that it is always an Int.

Elrod · July 1, 2021, 12:19pm

I think because it is a non-const global, as it has to be initialized in the init block:
https://github.com/JuliaLang/julia/blob/93d375cb021379a7a57aad05df86c5925cfaebcf/base/sysinfo.jl#L99

On the subject of getting more detailed information, Hwloc.jl provides some:

julia> Sys.CPU_THREADS
36

julia> Hwloc.num_virtual_cores()
36

julia> Hwloc.num_physical_cores()
18

julia> Hwloc.topology()
Machine (125.48 GB)
    Package L#0 P#0 (125.48 GB)
        NUMANode (125.48 GB)
        L3 (24.75 MB)
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#0 P#0
                PU L#0 P#0
                PU L#1 P#18
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#1 P#1
                PU L#2 P#1
                PU L#3 P#19
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#2 P#2
                PU L#4 P#2
                PU L#5 P#20
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#3 P#3
                PU L#6 P#3
                PU L#7 P#21
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#4 P#4
                PU L#8 P#4
                PU L#9 P#22
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#5 P#8
                PU L#10 P#5
                PU L#11 P#23
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#6 P#9
                PU L#12 P#6
                PU L#13 P#24
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#7 P#10
                PU L#14 P#7
                PU L#15 P#25
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#8 P#11
                PU L#16 P#8
                PU L#17 P#26
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#9 P#16
                PU L#18 P#9
                PU L#19 P#27
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#10 P#17
                PU L#20 P#10
                PU L#21 P#28
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#11 P#18
                PU L#22 P#11
                PU L#23 P#29
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#12 P#19
                PU L#24 P#12
                PU L#25 P#30
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#13 P#20
                PU L#26 P#13
                PU L#27 P#31
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#14 P#24
                PU L#28 P#14
                PU L#29 P#32
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#15 P#25
                PU L#30 P#15
                PU L#31 P#33
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#16 P#26
                PU L#32 P#16
                PU L#33 P#34
            L2 (1.0 MB) + L1 (32.0 kB) + Core L#17 P#27
                PU L#34 P#17
                PU L#35 P#35

But it’s no help telling big vs small cores (AFAIK):

julia> versioninfo()
Julia Version 1.8.0-DEV.92
Commit d1145d4569* (2021-06-29 01:41 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin20.5.0)
  CPU: Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.0 (ORCJIT, cyclone)
Environment:
  JULIA_NUM_THREADS = 4

julia> Sys.CPU_THREADS
8

julia> Hwloc.num_virtual_cores()
8

julia> Hwloc.num_physical_cores()
8

julia> Hwloc.topology()
Machine (3.41 GB)
    Package L#0 P#0 (3.41 GB)
        NUMANode (3.41 GB)
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#0 P#0
            PU L#0 P#0
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#1 P#1
            PU L#1 P#1
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#2 P#2
            PU L#2 P#2
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#3 P#3
            PU L#3 P#3
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#4 P#4
            PU L#4 P#4
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#5 P#5
            PU L#5 P#5
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#6 P#6
            PU L#6 P#6
        L2 (4.0 MB) + L1 (64.0 kB) + Core L#7 P#7
            PU L#7 P#7

Amro · July 1, 2021, 1:19pm

Thank you very much all! Now, I know the maximum available number of threads in my computer.

Can I consider the number returned by (Sys.CPU_THREADS) as the maximum processes that I can define in Julia as well, i.e. addprocs(11)+Master node=12 processes in total?
In other words, does the concept of threads is similar to process workers in Julia?

Elrod · July 1, 2021, 1:53pm

addprocs defaults to Sys.CPU_THREADS.
Distrubted normally runs code on the worker processes, so you’d want addprocs(12) for 12 workers.
You can add however many workers you want (memory allowing), but you’ll probably get the best performance with 6 or 12.

dlakelan · July 1, 2021, 2:09pm

The processor you have will only ever run 6 things at once. The threads are for “hyperthreading” which may occasionally allow your CPU to switch rapidly between running one thing on a core and running another on a core. This can reduce context switching time and allow the CPU to be utilized somewhat more efficiently but it only usually is a benefit when you have a lot of cache misses or other stalls.

For efficient numerical code the hyperthreading rarely helps much and can even hurt. So you should try both 6 and 12 and see what goes faster for your workload.

Amro · July 1, 2021, 2:23pm

So, setting a proper number of threads in Julia should be based on the number of logical processors in the computer, right? in my case 12 as I have
julia> Sys.CPU_THREADS
12
The above is also true for number of worker processes, right?

Is this because processes need to have their own memory partitions, so higher number of processes will lead to higher memory occupation which should not exceed its capacity, right?

dlakelan · July 1, 2021, 2:42pm

No it should be based on what you want to accomplish. For example a friend was running some MCMC procedures while editing his manuscript. He had 6 cores and 12 threads. I advised him to run 4 chains on 4 threads so that he had two real cores still available for interaction while editing the manuscript.

If he had run 12 threads his machine would have been unusable for editing. Even if he’d run 6 threads it would have been no interactive because of the 12 hyperthreads only 6 can run at any one time.

Amro · July 1, 2021, 2:53pm

Is this means that the 6 threads are running on 6 cores (one thread in each core), thus there are no available core for editing (in your example)?

dlakelan · July 1, 2021, 4:16pm

Yes more or less, and if you make 12 threads they still only have 6 running at any time

greg_plowman · July 1, 2021, 9:37pm

Since __init__ is run only once after module loading, then eval might be OK?

module Test
export test, test2

function __init__()
	val = rand(1:10)
    global TEST = val
    @eval const global TEST2 = $val
end

test() = TEST
test2() = TEST2
end

carstenbauer · July 1, 2021, 9:51pm

What do you get when calling hwloc directly from the CMD? The big/small core information might be available in the objects properties and we’re just not printing it. Maybe worth trying to extract a core from the topology and looking at its fields, i.e. something like collectobjects(:Core, gettopology())[1].attr. (For me the output is empty though.)

jzr · July 1, 2021, 10:34pm

Is there a heuristic for which kind of application I have, or is it unpredictable?

carstenbauer · July 1, 2021, 10:35pm

Apparently, since hwloc Version 2.4 lstopo seems to have a --cpukinds option. And in 2.6 they have specifically worked on distinguishing high and low performance cores for M1 mac’s, see hwloc/NEWS at master · open-mpi/hwloc · GitHub . I’ll take a look and will try to update Hwloc.jl accordingly tomorrow.

(Update: https://github.com/JuliaParallel/Hwloc.jl/issues/57)

Topic		Replies	Views
Does Julia detect the maximum number of threads availible and if so how? Performance threads	29	6021	November 19, 2021
Is there any overhead through too many threads? Performance	7	471	September 13, 2020
Maximum number of threads in Polyester.jl Specific Domains multithreading , polyester	1	224	June 4, 2024
What's the optimal setting for JULIA_NUM_THREADS on a Macbook Pro M1 Max Performance multithreading	4	1035	February 8, 2022
Customize number of threads interactively General Usage multithreading	8	2067	April 30, 2019

How to set up number of threads appropriately based on Hardware?

Related topics