Maximum BLAS threads number

Hi, consider the following example:

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 24

julia> using LinearAlgebra

julia> BLAS.set_num_threads(24)

julia> ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
16

julia> BLAS.set_num_threads(12)

julia> ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
12

It won’t let me get more than 16 threads for BLAS. The computer has 2 CPUs, each of which has 12 cores (24 HT). How does the limitation 16 come from?

Thanks!

https://github.com/JuliaLinearAlgebra/OpenBLASBuilder/blob/5a6eca317de505abc3678e47f08f4355646f511e/build_tarballs.jl#L42-L47.

IIRC, OpenBLAS allocates memory on initialization for the largest number of threads it can start so it isn’t possible to put a super high number. I guess 16 was good at the time but with high core counts becoming more mainstream, perhaps it would make sense to increase it.

Thanks for the link!

Is it just as simple as changing the number in flags+=(NUM_THREADS=16) or is there some nontrivial modifications to make more use of threads for now?