Julia Threads vs BLAS threads

I was searching for multithreading in Julia, and finally reached this interesting post. :slight_smile:

What I found is that OpenBLAS manages its own thread pool and uses it unless we call BLAS.set_num_threads(1). So when we start with JULIA_NUM_THREADS=4 and call BLAS.set_num_threads(4), it’s not 4x4 = 16 but 4 + 4 = 8.

When I run manymul_threaded! benchmark :

top - 00:15:38 up 65 days,  7:55,  6 users,  load average: 5.44, 2.78, 1.43
Threads:   8 total,   7 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 75.3 us, 24.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8004416 total,  3736952 free,  3768084 used,   499380 buff/cache
KiB Swap: 32767868 total, 31780740 free,   987128 used.  3825440 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
24866 alkorang  20   0 4002112 2.950g  15396 R 64.5 38.6   4:52.49 julia
24871 alkorang  20   0 4002112 2.950g  15396 R 60.5 38.6   3:44.93 julia
24872 alkorang  20   0 4002112 2.950g  15396 R 59.0 38.6   3:44.58 julia
24868 alkorang  20   0 4002112 2.950g  15396 R 57.4 38.6   0:50.98 julia
24873 alkorang  20   0 4002112 2.950g  15396 R 55.1 38.6   3:44.77 julia
24870 alkorang  20   0 4002112 2.950g  15396 R 51.6 38.6   0:49.21 julia
24869 alkorang  20   0 4002112 2.950g  15396 R 49.2 38.6   0:51.68 julia
24867 alkorang  20   0 4002112 2.950g  15396 S  0.0 38.6   0:00.00 julia

When I run randn(5000, 5000) * randn(5000, 5000); :

top - 00:16:41 up 65 days,  7:56,  6 users,  load average: 4.49, 3.12, 1.65
Threads:   8 total,   4 running,   4 sleeping,   0 stopped,   0 zombie
%Cpu(s): 99.6 us,  0.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem :  8004416 total,  3537528 free,  3963780 used,   503108 buff/cache
KiB Swap: 32767868 total, 31780752 free,   987116 used.  3627800 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
24871 alkorang  20   0 4197428 3.137g  16064 R 99.7 41.1   4:17.85 julia
24872 alkorang  20   0 4197428 3.137g  16064 R 99.7 41.1   4:16.56 julia
24866 alkorang  20   0 4197428 3.137g  16064 R 99.3 41.1   5:23.14 julia
24873 alkorang  20   0 4197428 3.137g  16064 R 98.7 41.1   4:17.31 julia
24867 alkorang  20   0 4197428 3.137g  16064 S  0.0 41.1   0:00.00 julia
24868 alkorang  20   0 4197428 3.137g  16064 S  0.0 41.1   1:18.36 julia
24869 alkorang  20   0 4197428 3.137g  16064 S  0.0 41.1   1:06.98 julia
24870 alkorang  20   0 4197428 3.137g  16064 S  0.0 41.1   1:06.37 julia

(This comes from top command on Linux, top -H -p <pid>)

This tells us calling a multithreaded function does not mean creating new threads, depending on the implementation.

Similarily, calling BLAS.set_num_threads(2) does not destroy 2 threads from BLAS thread pool. It just deactivates 2 threads from computation.
BLAS.set_num_threads(2); randn(5000) * randn(5000); :

top - 00:43:46 up 65 days,  8:23,  6 users,  load average: 1.03, 0.32, 0.54
Threads:   8 total,   2 running,   6 sleeping,   0 stopped,   0 zombie
%Cpu(s): 50.3 us,  0.0 sy,  0.0 ni, 49.6 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  8004416 total,  3718088 free,  3781652 used,   504676 buff/cache
KiB Swap: 32767868 total, 31789756 free,   978112 used.  3809092 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
24866 alkorang  20   0 4002112 2.954g  16112 R 99.9 38.7   8:46.08 julia
24871 alkorang  20   0 4002112 2.954g  16112 R 99.9 38.7   7:39.11 julia
24867 alkorang  20   0 4002112 2.954g  16112 S  0.0 38.7   0:00.00 julia
24868 alkorang  20   0 4002112 2.954g  16112 S  0.0 38.7   1:18.36 julia
24869 alkorang  20   0 4002112 2.954g  16112 S  0.0 38.7   1:06.98 julia
24870 alkorang  20   0 4002112 2.954g  16112 S  0.0 38.7   1:06.37 julia
24872 alkorang  20   0 4002112 2.954g  16112 S  0.0 38.7   6:52.61 julia
24873 alkorang  20   0 4002112 2.954g  16112 S  0.0 38.7   6:53.96 julia

This is because OpenBLAS does not use its thread pool when we set the number of threads to 1, 4x1 = 4 threads.

2 Likes