I have a program that I instruct to use the 32 physical cores on a machine. But the program typically uses 37 or 38, presumably due to BLAS being used by one of the loaded packages.
As a general rule, would it be faster to reduce the number of physical cores down from 32 or to leave things as they are?
(Obviously, I can time things for this particular program, but this is going to be a recurring issue for other programs, also.)