Its hard to refer without exact example however re 8 BLAS threads pls see: BLAS performance testing for Julia 1.8. Not sure if this could be to iterest, however, one may also easily change BLAS backend thanks to libblastrampoline
and instead of default OpenBLAS
use other libraries like i.e. MKL
. Depending on the matrix size and the hardware sometimes this can provide significant benefits. BLAS
performance can be checked with i.e. BLASBenchmarksCPU.jl
or BLASBenchmarksGPU.jl
. Hope it helps.