Thanks Norman!
The 15 GFLOPS result is consistent with the 60 GB/s bandwidth observed in the STREAM Triad benchmark conducted by Phoronix. It’s actually slower than the 6 year old Cascade Lake 10980XE that achieved 19 GFLOPS in Julia as reported by @Elrod .
I’ll be publishing a video in the coming days where I did some Linear Algebra performance testing on the M4 Max and M3 Ultra in Julia. In this particular test, the M4 Max is 5 times faster and achieves comparable STREAM bandwidth on 12 P-cores to a 96 core 12 channel DDR5 Zen 5 Epyc.