Regarding the multithreaded performance of OpenBLAS

carstenbauer · January 30, 2022, 8:29am

This has already been mentioned by me and @jpsamaroo in the other discourse thread you linked. Basically OPENBLAS_NUM_THREADS=1 (you have a typo there) is special because it makes OpenBLAS computations run on the respective calling (Julia) thread. for OPENBLAS_NUM_THREADS>1 the behavior changes qualitatively in the sense that OpenBLAS will create an own pool of OpenBLAS threads which it will use to run BLAS computation triggered by any of the Julia threads (there is only a single pool of OpenBLAS threads, irrespective of how many Julia threads you have). Hence, assuming Threads.nthreads() == 16, setting OPENBLAS_NUM_THREADS=1 will, effectively, make all your BLAS computations run on all of the Julia threads (16) whereas setting OPENBLAS_NUM_THREADS=2 will make all your BLAS computations run on only 2 separate OpenBLAS threads. That’s why you see such horrible performance for your 16/8 case for example.

As for your other question, in general, multithreading your computation with Julia threads (if possible) and using OPENBLAS_NUM_THREADS=1 should be better than using only a single Julia thread and OPENBLAS_NUM_THREADS=16. The main point is that you can parallelize your specific application much more effectively than OpenBLAS, which can only parallelize the BLAS parts. However, as with every “rule of thumb”, there are exceptions and it can depend on the computation at hand. (BTW, in your case, the rule of thumb seems to hold: compare 16/1 (538) to 1/16 (900).)

Topic		Replies	Views
Parallel computing with * Performance question	27	1111	December 29, 2022
Performance issue with multithreaded computation with matrix operations at its heart (Threads.@threads vs. BLAS threads) Performance blas , parallel , multithreading , linearalgebra , threads	7	411	November 13, 2023
BLAS fails in Julia's multithreaded mode with too many threads General Usage question , blas , hpc	4	1365	February 15, 2017
BLAS vs Threads on a cluster Performance	6	529	April 23, 2024
BLAS performance testing for Julia 1.8 Performance blas , multithreading	30	8071	July 19, 2022

Regarding the multithreaded performance of OpenBLAS

Related topics