For distributed matmul, you should set the number of BLAS threads per core to 1 with BLAS.set_num_threads(1)
, otherwise you’ll oversubscribe.
For distributed matmul, you should set the number of BLAS threads per core to 1 with BLAS.set_num_threads(1)
, otherwise you’ll oversubscribe.