How to deal with more threads than requested?

I have a program that I instruct to use the 32 physical cores on a machine. But the program typically uses 37 or 38, presumably due to BLAS being used by one of the loaded packages.

As a general rule, would it be faster to reduce the number of physical cores down from 32 or to leave things as they are?

(Obviously, I can time things for this particular program, but this is going to be a recurring issue for other programs, also.)

IMO slightly over-subscribing (and sporadically) thought out the execution of the whole program is usually fine, and may even improve performance depending on the task.

1 Like