I run julia code (multithreaded) on a Ubuntu machine. The cpu were 2 AMD chips that totally have 256 available threads out of 128 physical cores (i.e. 64 cores per chip entity). (But I find that julia can set thread number more than that, which I find strange, a digression…).
When I started to run a loop, about 10 minutes after, my hardware started to beep and the indicator LED turned red. After entering the bios, I find this logging
So the cause was that the temperature inside the hardware box goes too high. Of course I should take physical measures, e.g. adding fans.
But I also want to ask: what sorts of julia code operations are the most likely to stress the CPU and make it produce enormous heat? And how to avoid them (as written in the title)?
One situation I could probably come up with is that, instead of writing
I don’t think that this is specific to Julia. Anytime a CPU is not idle it will be generating heat. Though I’ve read that SIMD-heavy or memory-intensive (cache-thrashing) code can generate even more heat.
(So definitely don’t keep a CPU running instructions if you don’t need it to. E.g. don’t use spinlocks to wait a long time, use other types of locks and Julia’s higher level synchronization mechanisms like channels that allow a CPU to sleep until there is something to do.)
Fundamentally this means that if you intend to use all your cores to do work for an extended time then you will need better cooling. But improving memory locality might help a bit.
There’s a whole list of things you can do, including simply adding more cooling. However, a properly built PC should not reach critical temperatures even under high load. You should check whether they forgot to remove the sticker from the cooler that’s already in there. I have unfortunately seen this more than once.
Secondly you can try turning down the core clock frequency somewhat. This will slightly reduce performance, but the efficiency gains can be substantial. My overclocked AMD CPU (watercooled) consumes almost 1.5x as much power as a stock configuration. You can usually do this in the bios.
Not just Julia—a lot of numerical software packages, from Jax to Matlab, grab all of the cores by default. I always thought this is not very composable. But if you don’t do it then users complain that you are slow compared to the others. So you have to be aware of this and learn how to disable threading in each system at need.
Making the application handle high processor temperatures should be very low priority, maybe discouraged. This logic is hard to make portable and reliable, and it only controls the application. An almost-idling loop while critical_hot() Libc.systemsleep(1.0) end still doesn’t stop an operating system from running background processes (sleep actually encourages Julia to run other Tasks), and even truly idle processors have current and generate heat to stay ready (we wouldn’t want registers to forget data).
There are already processor-level throttling and OS-level power plans to reduce any activity by the processor. The logs are indicating throttling, so your cooling system is just removing heat more slowly than the processor expects. At a hypothetical extreme, a bad enough cooling system would cause overheating even for all idle processors. I’d start checking there before tweaking the processor.