Does Mac M1 in multithreads is slower that in single thread?

You can use BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate") to start forwarding to Accelerate, but it’s LP64, so you can’t use Julia’s native * to do GEMM (since that will expect ILP64, and will thus still dispatch to OpenBLAS), you need to write your own wrapper. The full gist has a demonstration.

2 Likes