Apple M1, M1 pro M1 Max and Julia developpers

I just tried STREAMBenchmark.jl on my M1 and got 90 GB/s for some benchmarks on a single thread. Multithreading does not improve performance.
I too prefer (and primarily use) Linux, but I like to have the M1 around for benchmarking.

While it won’t be available in laptop chips, Intel’s upcoming Saphire Rapids will offer some chips with HBM, and some sources are speculating 1TB/s or so memory bandwidth (divided among many more cores, of course).
AMD’s 3d stacking/V-Cache will give many of their chips a very large L3 cache (which itself could have 2TB/s bandwidth per chiplet, but at 32+64 MiB is much smaller than HBM modules), which (depending on the workload) could help a great deal as well.
So still some interesting developments in x86/Linux compatible land coming in the next year.

The M1 pro/MAX have 10 perf cores

8 perf + 2 efficiency.

I may just need to test them more to get threading deadlocks, but I haven’t seen them from LoopVectorization/Polyester. My impression (having not investigated it much) is that base threading and libraries using it are at risk, particularly in code that spawns tasks relatively rapidly.

1 Like