CPU performance wise it won’t beat AMD Ryzen 7950x.
Though the memory sub system has more bandwidth and RAM.
So in cases the 7950x is memory bounded the M2 might not.
So usually it is better think on Apple’s CPU is the context of significant and enablers.
Mainly since their GPU memory is shared with CPU which means they can do things which require high memory (They are very attractive to those who do inference with large models).
I have the M2 Max (The top tier configuration of the Max).
I doesn’t seem like it currently does. I wonder how feasible it is, given that the sparse routines in Accelerator are probably(?) not in one-to-one correspondence with SuiteSparse’s
I hadn’t given it much thought until I saw this thread, but the speedup from using Accelerate over OpenBLAS can hardly be overstated! Maybe that was clear to everyone but me
Simple benchmark: matmul on dense 1000x1000 Float32 matrices yields a whooping speedup of ~4x for me (with 4 BLAS threads). Granted, I tried it on a “meager” M1 (not Pro, Max or Ultra), so it might be less pronounced on the beefier SOCs.
Still, pretty great for a drop-in replacement.
Or frustrating, because Apple is, from what I learned today, pretty tight-lipped about their “secret ingredient”, which is a FMA accelerator unit in the Silicon chips dubbed Apple Matrix Coprocessor (AMX). Apple does not disclose how to use it, and officially makes it only accessible through the Accelerate framework. It’s been reverse engineered though.
In my case, I am using M1 Max instead of the Ultra. I can say that it is beating a huge Xeon server easily when performing complex satellite simulations. In this case, we does not have much room to make parallelizations, which would be better for the Xeon (64 cores).
Given that the M2 I am using cost a fraction of the price of the Xeon server, IMHO, the M-arch computers from Apple are the best platforms to perform those kind of computations.
Yes, it is! The Xeon of course is much more capable, but its performance for those simulations are way behind. I think the M1 Max completes one scenario in 40% less time. Hence, the relative cost of the M1 would still be better even if I selected a lower end Xeon which will likely have worse performance than the current one.
Sorry to jump into this conversation, we are currently looking for a workstation/server to run similations in a 7-10 researchers environment (julia but also a lot of R and a few python and matlab).
Why nobody names the AMD Threadripper CPUs? Looking at benchmarks it seems the best single thread (that for many custom program remains important) while not having the 128GB ram limitation of desktop cores and still a lot of cores…
I have just acquired a workstation with an AMD Ryzen 9 7950X 16-Core Processor, 4501 Mhz, 128 GB DDR5, cache 1, 16, 64 MB. It is very snappy. My colleague has now purchased a M2 Ultra (it hasn’t arrived yet, so I cannot run comparisons myself).
One of the main differences is the max memory bandwidth. For you CPU, Google says ~73 GB/s, which is decent, but nowhere near the 800 GB/s you get with M2 Ultra. And sparse matrix multiplication, which is at the heart of solving PDEs, is memory bandwidth-limited.