BLAS performance testing for Julia 1.8

carstenbauer · October 13, 2021, 8:04am

A side comment / feature request : We have JULIA_EXCLUSIVE=1 for compact pinning of Julia threads (i.e. pin 1:N Julia threads to the first 1:N cores). If we had more information about the system (Sockets / NUMA domains), we could also offer a “scattered pinning”, where Julia threads are pinned to cores from both sockets in an alternating fashion. This can have a big influence on performance (MFlops/s), see e.g. GitHub - JuliaPerf/BandwidthBenchmark.jl: Measuring memory bandwidth using TheBandwidthBenchmark (Also check it out if you just like unicode plots ).

But let me stop derailing this thread

Topic		Replies	Views
Maximum BLAS threads number General Usage question	2	1603	June 15, 2019
BLAS fails in Julia's multithreaded mode with too many threads General Usage question , blas , hpc	4	1364	February 15, 2017
BLAS thread count vs Julia thread count General Usage question , performance , linearalgebra	21	2702	April 6, 2021
Regarding the multithreaded performance of OpenBLAS Performance blas , multithreading	7	5365	January 31, 2022
Julia under rosetta 2 on mac m1: threading/scheduling issues with openblas? Internals & Design mac-m1	2	1055	November 26, 2021

BLAS performance testing for Julia 1.8

Related topics