A side comment / feature request : We have JULIA_EXCLUSIVE=1
for compact pinning of Julia threads (i.e. pin 1:N Julia threads to the first 1:N cores). If we had more information about the system (Sockets / NUMA domains), we could also offer a āscattered pinningā, where Julia threads are pinned to cores from both sockets in an alternating fashion. This can have a big influence on performance (MFlops/s), see e.g. GitHub - JuliaPerf/BandwidthBenchmark.jl: Measuring memory bandwidth using TheBandwidthBenchmark (Also check it out if you just like unicode plots ).
But let me stop derailing this thread