Mac mini M4 pro vs AMD Ryzen 9 9950X for Linear Algebra?

How much does it cost in total?

The entire build (including case, fans, motherboard, 4TB SSD etc) costs nearly 1700 USD after tax. Everything was bought on Amazon on Prime Day although the prices remain very close to what I see at this moment.

3 Likes

Hello,
I am also planning to build a “personal server” with 9950X and now gathering information about which parts to use. If possible, could you also tell us what motherboard and CPU cooler you chose? Are there important points to take into account about the choice of parts (e.g. durability, heat, etc) for running a long-time job (say for 1 week)? (FYI, I am using 5950X PC with Noctua’s large CPU fan, but wondering if it is enough also for 9950X.)

For cooler, I use Thermalright Peerless Assassin 140. Somehow it is no longer in stock after the Prime Day. Only the smaller 12cm version is on Amazon at this moment. It’s essentially a budget version of the Noctua’s large dual tower cooler. I didn’t get Noctua just because they are expensive. Thermalright products turn out to be surprisingly good given the much lower prices. I was expecting something loud from them but it is not bad at all. Noctua is considered the best air cooler. So, I don’t think there will be any issue with 9950X (under default CPU settings).

For motherboard, I picked X870 Aorus Elite motherboard from Gigabyte. I don’t think there is something special here. It’s mostly about how many ports are needed unless someone cares a lot about overclocking the CPU. But I just prefer to have the USB4 support coming with X870/X870E. The bad news is that the on-board LAN for the ethernet port from Gigabyte cannot be detected by Ubuntu (at least in an easy approach). But I happen to have an adapter that works.

2 Likes

I think that it is important to notice that dense matrix matrix product (gemm) is a very particular case among Linear Algebra operrations because its performance does not depend on the RAM Bandwidth (compute bound).

Most of LA operations (BLAS1 and BLAS2) are memory bound and thus scale with RAM Bandwidth (for large enough size).

The RAM bandwidth of M4 pro is very large (273 GB/s) and M4 max even larger (526 GB/s).

This at least 2x faster than 9950X (AMD’s Ryzen 9950X: Zen 5 on Desktop - by Chester Lam).

There are tons of physical simulation that are memory bound (CFD, wave propagation…) that will run faster on apple devices.

There are also tons of problems that depends on fast gemm.

My use case is mostly for doing computation that can be boiled down to some linear problems (possibly large matrix-matrix multiplications and solving linear systems repetitively).

Considering the title of this thread I think it is relevant to wonder about what part of Linear Algebra operations is targeted.

4 Likes

Much (much !) cheaper than apple if the 96 GB are required !

I would think other factors outweigh benchmark comparisons. Like:

  • OS environment: are you more comfortable with MacOS or Linux? Do you rely on some software that is available only on one platform?
  • cost of other hardware: RAM can be prohibitive if purchased at Apple
  • other uses for the computer: most likely you won’t run BLAS 100% of the time. Probably not even 10% of the time. What do you wish for the hardware to be able to do the rest of the time?

Btw, I just realized the new AMD ryzen mobile CPU isn’t actually that far away from the M4 pro:

… At a fraction of the price and very cheap upgrade costs for RAM + SSD (you can even bring your own):

I’d be really curious how those compare for actual Julia workloads (especially compilation of big packages).

If anyone has either of these, I’d love to hear about some benchmarks!

2 Likes

One important caveat of the Zen 5 mobile CPUs is that unlike the desktop parts, they don’t have full AVX 512, which is something that doesn’t show up in a lot of consumer-focused benchmarks, but is a very big deal for things like linear algebra.

For that reason, I’d expect that CPU to be a lot slower than the desktop parts. Not sure how much it’d suffer relative to the M4 Pro though for linear-algebra like stuff.


Edit: “don’t have AVX 512” → “don’t have full AVX 512”

2 Likes

This is not true. They have AVX512, but it runs a lower speed.

2 Likes

Sorry, what I meant to write is that they don’t have full AVX-512. They have the dual-issue technique that Zen 4 used to emulate AVX-512 instructions while mostly only having 256 bit hardware.

The big performance gains that Zen 5 desktop had over Zen 4 desktop though were almost all from the switch to real AVX 512, and those gains likely won’t be present in the mobile parts

2 Likes

Doesn’t it make sense to compare against the mac mini at that point?

I think he is comparing to a mac mini? The computer he listed there is 900€, whereas

  • a 32GB RAM / 1TB storage mac mini with an M4 is 1620€
  • a 24 GB RAM / 1TB storage mac mini with an M4 pro is 1879€
  • a 48 GB RAM / 1TB storage mac mini with an M4 pro is 2339€ (there’s no 32 GB option).

The mini is a fantastic value machine, but typically only if you don’t upgrade it at all (i.e. stick to 16GB of RAM, 256 GB SSD).

1 Like

IMHO you may not compare SSD and RAM upgrade on apple silicon.

  • The SSD upgrade is awfully expensive and an external TB4/5 enclosure can be clever.
  • The RAM upgrade should be compared to a VRAM upgrade on discrete GPU since its speed is comparable to GPU RAM speed. This high speed RAM (shared by the CPU and GPU) is a consequence of the SOC architecture and should not be compared (IMHO again) to a discrete RAM upgrade on these (nice) AMD mini PCs.
1 Like

So you can’t compare it because it’s overpriced? That’s kinda the point of comparing. Yes, I would strongly suggest that anyone buying a Mac mini just go with the 256 GB variant and then buy an external SSD. But that’s not a point in Apple’s favour since you can also buy the same external ssds for other machines, and have an adequate main drive.

The RAM Apple is using is not any faster than than the ram others are using. In fact, the above linked bee-link machine is using LPDDR5 RAM clocked at 7500MHz which is the same clock the base M4 uses for its LPDDR5 RAM. You can also go and buy the faster 8533 MHz RAM they put in their M4 Pros on the market. It’s not special.

The only real difference is that the M4 has a wider memory interface, but that has nothing to do with the RAM they’re installing, and that wider interface doesn’t make it 200€ more expensive to slap 8 more gigs of RAM into a Mac mini, it’s just artificial price gating.

What’s going on here is that the base Mac Mini is being sold for an absolute steal of a price, probably with very low or maybe even zero margins, and then they’re making up for it by charging way above market prices for RAM / storage upgrades.

Yes, the macs have better memory bandwidth, but if you’re not actually bandwidth constrained then that doesn’t really matter. If you are bandwidth constrained, then yes, there are circumstances where a Mac Mini with RAM upgrades can be worth it versus CPUs being released by AMD (though this might not last long, since next year AMD is expected to be releasing APUs with wider memory interfaces next year).

5 Likes

I was probably unclear: I agree with you about the SSD upgrade but disagree about the RAM.

Here I disagree, the RAM placed on the SOC allows for higher speed, lower latency and low energy consumption. If you have an upgradable RAM, you will need a lot of CPU-RAM channels (8?) to match M4 speed. The latency depends on the length of the wires.

Here is the crucial point. I believe that a large majority of scientific computations are memory bound. One important exception is gemm (BLAS3) dominated computations where the bandwidth is irrelevant while the SIMD width shines. All the perf comparisons in this thread are based on this exception: I think that it is misleading.

Example: I currently work on time dependent wave simulations on a MBP M1 max with 64 GBs of RAM and I am able to run 768^3 mesh based computations at an 200GB/s effective bandwidth. On my
intel workstation (13900 K + RTX 3070), I can reach about the same speed with CUDA but I am limited to 256^3 meshes. On the CPU, the effective bandwidth is about 30 GB/s…

This would be the same for CFD and many other physics simulations based on PDEs.

3 Likes

Ah okay I misunderstood you.

While the physically close RAM is an advantage, it’s mostly an advantage for latency, not bandwidth. The memory-bound performance of M-series macs has very little to do with these shorter wires. The biggest factor here for the bandwidth is that M4-pro has a 256GB wide memory interface, and that’s definitely doable on hardware with upgradable RAM (though yes, it will increase power consumption).

The CAMM2 standard may improve this situation in the near future.

I think this claim is heavily influenced by your personal experience. In my experience, this isn’t at all the case.

2 Likes

Yes, I think the benchmark should be comparable to the m4 pro mac mini (12 core + 1tb ssd) setup which starts at ~1800€… Comparing the HX 370 to the non pro version seemed unfair in this context :wink:

So almost double the price, while roughly giving the same performance on paper in the same form factor…

It’s of course still pretty hard to compare those as many argue, and 2 benchmarking numbers will certainly not be enough to paint the full picture.

I’d love to have a Julia Benchmark, which runs:

  1. E.g. compiling all of GLMakie
  2. some memory bound simulation in Julia
  3. some CPU bound simulation
  4. some GPU array code (not sure though if the GPU infrastructure is reliable enough, to not mainly benchmark the maturity of CUDA.jl vs AMDGPU.jl vs Metal.jl)

And then have anyone who gets hands on the hardware post the results here :wink:

6 Likes

Are people still doing GitHub - IanButterworth/SystemBenchmark.jl: Julia package for benchmarking a system ?

Some extra info:

Phoronix benchmark ( basic model ) :
" Apple M4 Mac Mini With macOS vs. Intel / AMD With Ubuntu Linux Performance" https://www.phoronix.com/review/apple-m4-intel-amd-linux

“With the assortment of CPU benchmarks carried out the Apple M4 Mac Mini on macOS Sequoia was tending to perform similar to the AMD Ryzen 5 9600X / Ryzen 7 9800X3D desktop processors on Ubuntu 24.04 LTS. But where the M4 was really a standout winner was in the performance-per-Watt with the power efficiency typically well in the lead compared to the tested x86_64 desktop processors on Linux.”

And if you need a lot of fast memory ( ~ 64GB ) ,
compare the "Mac mini M4 Pro " with "Mac Studio M2 Max "

Minimal comparisons ( please double check )

Mac mini M4 Pro 64Gb Mac Studio M2 Max 64Gb
$1,999 $2,399
273 GB/s memory bandwidth 400 GB/s
3x TB5 4x TB4
16‑core GPU 30‑core GPU
Gigabit Ethernet 10Gb Ethernet port

And the Apple Silicon M-series llama.cpp performance

4 Likes