Multithreading on Apple M5 chips

I’m considering upgrading to the new Apple M5-series hardware and have a question regarding how Julia will handle the new core tiers.

Previous generations of Apple Silicon had two types of cores, “performance” (P-cores) and “efficiency” (E-cores). On M1–M4 chips, starting Julia with julia -t auto defaults to the number of P-cores. My limited understanding is that this is because the E-cores have significantly slower clock speeds and IPC and so including them in a standard parallel workload can create bottlenecks.

With the new M5 chips, there are now 3 types of cores: “super cores”, “performance cores”, and “efficiency cores”. The new P-cores are something of a middle tier and are “optimized for power-efficient, multithreaded workloads”. This table from Andrew Cunningham’s Ars Technica 2026 16" MacBook Pro review article that dropped today summarizes the SOCs nicely:

Model Fastest Cores Medium Cores Efficiency Cores GPU Cores Memory Bandwidth
M5 Max Up to 6 (super) Up to 12 (performance) 0 Up to 40 Up to 614 GB/s
M5 Pro Up to 6 (super) Up to 12 (performance) 0 Up to 20 307 GB/s
M5 4 (super) 0 6 Up to 10 153 GB/s
M4 Max Up to 12 (performance) 0 4 Up to 40 Up to 546 GB/s
M4 Pro Up to 10 (performance) 0 4 Up to 20 273 GB/s
M4 4 (performance) 0 6 Up to 10 120 GB/s

Since the M5’s P-cores are significantly more capable than the E-cores, I’m curious if anyone knows what the expected behavior for multithreading will be (and, eventually, is). Basically, will julia -t auto on an M5 Pro/Max chip detect only the S-cores, or will it include the P-cores as well? If it’s the former, the top M5 Max (with 6 S-cores) might actually look like a downgrade for certain workloads compared to the top M4 Max (with 12 P-cores)! Supposing -t auto detects only S-cores, would Bad Things happen if I forced Julia to use all cores (e.g., julia -t 18)?

I know the M5 Pro/Max machines aren’t available until Friday, so I expect hard data to be limited for a while. I’m hoping to learn more as people get these units in hand, and any thoughts or early experiences are greatly appreciated!

4 Likes

Julia is generally not at all aware of different core types (at least it wasn’t last time I checked). I suspect that it will detect all cores. (On some systems the auto heuristic divides the number of cores by two but that has nothing to do with core efficiency.)

Personally, I wouldn’t worry that much and wouldn’t base my decision on this criterion.
You can always tune the number of threads as desired. Also, single thread performance is often times much more important and I suspect that the new series is better in this regard.

Currently, the code that determines # cores on Apple Silicon leads me to believe that only the Super cores will be detected. I’ve opened an issue to discuss.

1 Like

Something else to clarify is that the --auto flag is only about guessing the number of OS threads to spawn. Julia doesn’t pin the threads to cores (at least without @carstenbauer’s GitHub - carstenbauer/ThreadPinning.jl: Readily pin Julia threads to CPU-threads · GitHub), so the OS is free to schedule them wherever it wants (including efficiency cores). So the detection stuff is just about guessing a good number to use by default, and you can always pass whatever number you want.

3 Likes

To expand on this, you cannot really pin threads on MacOS. You can only ask nicely by setting thread priorities (e.g. for realtime audio).

Depending on the workload, skipping E-cores could be worse than a strategy that assigns independent work chunks, starting with S-cores, moving on to P-cores and finally when everything else is exhausted using the E-cores.

Also, it’s certainly suboptimal for 2 un-coordinated user-space apps to each start 18 threads or their own thread pools that don’t know about each other.

2 Likes