As far as I understand, the general philosophy that’s been taken so far by the Julia developers is that an optimization should only be automatically applied if they know for sure that
- It’s correct / safe to apply the optimization
- The optimization won’t accidentally hurt performance.
Unfortunately, implicit multi-threading makes both of the above criteria very difficult to satisfy. Even if the safety / correctness concern were satisfied (which is not trivial to do), multi-threading has a lot of overhead. The general heuristic is that it takes about 1 microsecond to spawn a multi-threaded task in Julia which is on the order of 1000 CPU cycles. This means that if I write
for i in 1:N
if it takes less than ~10 microseconds to run that loop, it was probably a mistake to try and multi-thread it. However, the amount of time the loop takes to run depends not only on
N, but the details of
f. Knowledge about how to handle this right is not something our compiler currently has or is likely to have anytime soon.
Instead, we generally insist that the programmer opts in to optimizations like multi-threading explicitly because they know more about their program than the compiler. However, we generally try to make it very easy to opt into these sorts of things which is where things like the performance annotations in base (
@fastmath, etc.), and various packages like KernelAbstractions.jl, LoopVectorization.jl and ThreadsX.jl come in.