Mac mini M4 pro vs AMD Ryzen 9 9950X for Linear Algebra?

sob · November 21, 2024, 10:05pm

It doesn’t get said enough. Apple is a RAM vendor. 50% margin on the base config. 95% margin on memory upgrades.

ufechner7 · November 22, 2024, 2:12am

Not really true:

It is important to know that HBM is about five times pricier than DDR5. It commands a significantly higher cost due to its superior performance and capacity advantages over standard DRAMs. The complexity of constructing HBM memory devices and stacks is also notably higher compared to traditional DDR ICs and modules.6 May 2024

Highest performance comes at a price.

Oscar_Smith · November 22, 2024, 2:15am

Apple isn’t shipping HBM. They’re shipping LPDDR (although in a non-JDEC spec)

Eben60 · November 22, 2024, 10:50am

I was that clever as I bought a Mac mini with 256 GB plus external Thunderbird SSD with 1 TB. While I find this configuration usable for me, I can’t unconditionally recommend it.

First, having data not where the OS expect them to be tends to cause troubles with access privileges. On setting up my Mac the first time I intended to have my user folder on the external disk – which is possible in theory. That worked until an OS update… After prolonged fights, I gave up on it. After all, the main reason I prefer Mac is that I actually want just to use my computer to get the things done, not to mess around with the OS. Then I gave up on having Music and Photos libraries as well as Julia projects on that disk - meaning, even though I don’t have that much there by modern standards, my system disk is pretty full now.

Second, I had occasional crashes leading to computer restart. While not often, that almost never happened to me before on Macs. I tracked the crashes tentatively to external disk (communication) glitches when running an application from that disk. After all applications were moved onto internal disk, I never had such crashes anymore.

LaurentPlagne · November 22, 2024, 12:53pm

Thanks for sharing.

I did not experience this solution myself since I bought a quite expensive version (MBP M1Max 64 GB / 2To) a few years ago in a professional context and feel no need to upgrade ever since.

So the conclusion remains the same: Apple makes very large margin on SSD and RAM upgrades.

The competition gets better though, and I will be very happy to return to Linux OS when one of the competitors (AMD, Qualcomm, Intel, NVidia…) manage to offer a similar performance (per Watt).

sob · November 22, 2024, 10:35pm

Thanks, I needed a good laugh.

Dimos · December 22, 2024, 5:55pm

This benchmark is good but it is not clear if it exploits all 16 threads or not. Perhaps it would have been better to try and set JULIA_NUM_THREADS=1, and repeat the benchmark. This way we know (or have an idea) about single core performance. Would you be kind enough to update your results with single core tests? Then I can give you some pseudocode you can run to also test Sparse Matrix performance which is mostly used in numerical computing.

PetarM · April 18, 2025, 3:57pm

Hi Norman!.

Could you also benchmark matrix-vector multiply GFLOPS in Julia? I would expect your system to score around 20 GFLOPS based on your memory speed.

Norman · April 22, 2025, 7:28pm

Here are the results for both Matrix-Vector multiplication and Matrix-Matrix multiplication with Julia v1.11.5:

julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 9950X 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, generic)
Threads: 16 default, 0 interactive, 8 GC (on 32 virtual cores)
Environment:
  JULIA_NUM_THREADS = 16

julia> using LinearAlgebra

julia> N = 449*10*2;

julia> A = rand(N, N); B = rand(N); C = similar(B);

julia> 2e-9N^2 / @elapsed mul!(C,A,B)
# First Run 1.5482084452662266

julia> 2e-9N^2 / @elapsed mul!(C,A,B)
14.976101777558263

julia> 2e-9N^2 / @elapsed mul!(C,A,B)
14.737536001836334

julia> 2e-9N^2 / @elapsed mul!(C,A,B)
15.0477971386375

julia> 2e-9N^2 / @elapsed mul!(C,A,B)
14.928891417753537

julia> A = rand(N, N); B = rand(N, N); C = similar(B);

julia> 2e-9N^3 / @elapsed mul!(C,A,B)
# First Run 1256.2057391195665

julia> 2e-9N^3 / @elapsed mul!(C,A,B)
1736.3682467430988

julia> 2e-9N^3 / @elapsed mul!(C,A,B)
1741.1184796553234

julia> 2e-9N^3 / @elapsed mul!(C,A,B)
1744.847565528222

I tried the nightly version of Julia as well. The results are similar.

Norman · April 22, 2025, 7:32pm

Hi! From what I understand, JULIA_NUM_THREADS wouldn’t make a difference as the OpenBLAS handles the threads. I tried starting Julia with --threads 1 and doesn’t seem to change the results. It seems that OpenBLAS will use 16 threads anyway:

julia> BLAS.get_num_threads()
16

photor · April 23, 2025, 1:13am

you can use BLAS.set_num_threads(1).

PetarM · April 23, 2025, 1:19pm

Thanks Norman!

The 15 GFLOPS result is consistent with the 60 GB/s bandwidth observed in the STREAM Triad benchmark conducted by Phoronix. It’s actually slower than the 6 year old Cascade Lake 10980XE that achieved 19 GFLOPS in Julia as reported by @Elrod .

I’ll be publishing a video in the coming days where I did some Linear Algebra performance testing on the M4 Max and M3 Ultra in Julia. In this particular test, the M4 Max is 5 times faster and achieves comparable STREAM bandwidth on 12 P-cores to a 96 core 12 channel DDR5 Zen 5 Epyc.

Elrod · April 23, 2025, 3:33pm

Note that the Intel 10980XE has 4 channels of DDR4, while your AMD 9950X has 2 channels of DDR5.
The DDR5 probably has less than twice the bandwidth (depending on how you clock it, IIRC my DDR4 is @3200), so having 2x the channels should mean the 6 year old chip should still come out ahead in most cases.

12 channels on the Zen5 Epyc is much better.
It probably suffers from needing a large number of cores to realize that bandwidth?
Have you tried setting different thread counts when comparing to the M3 Ultra and M4 Max?

PetarM · April 23, 2025, 4:41pm

My source for the Epyc, was a benchmark run from Phoenix: AMD EPYC Turin 8c Vs. 12c Memory Channel DDR5 Comparison Benchmarks - OpenBenchmarking.org

However, I’ve seen that in the latest report by Fujitsu they report a much higher BW in Triad for the same processor and with the same DDR5-6000 RAM: https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-performance-report-primergy-rx2450-m2-ww-en.pdf

It’s as if they applied a 1.33 multiplier for the TRIAD to account for potential write-allocate?

I was wondering the same about my M4 Max and M3 Ultra results: they achieved 25% higher BW in the Update kernel compared to the Triad kernel, but about the same in the copy kernel. This is something I intend to investigate further in the future.

ufechner7 · April 23, 2025, 7:24pm

If you need an alternative to the Mac mini M4 pro, please consider this AMD CPU: https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-plus-395.html

It has a much higher RAM bandwidth than the Ryzen 9950X. For example this PC: Framework | Order a Framework Desktop with AMD Ryzen™ AI Max 300

These goes along with unified memory of 32GB up to 128GB, enabling up to 256GB/s of memory bandwidth, compared to Ryzen 9 9950X’s theoretical 96 GB/s.

photor · April 24, 2025, 5:22am

But that one is much more expensive.

Mason · April 24, 2025, 11:11am

The base model is a lot more expensive than the base mac mini, but once you’re looking at 64 GB of RAM (which you’ll want if you want to make use of all that memory bandwidth), the prices equalize, and the Framework machine can go up to 128 GB of ram and it’s SSD options are way cheaper. (It also comes with a way beefier GPU, and more CPU cores)

Topic		Replies	Views
How to choose a workstation for optimal performance Offtopic question , hardware	51	5314	November 13, 2021
Workstation advice (for mostly Julia use) Offtopic question	22	3227	December 25, 2020
Apple silicon full power Performance hardware , apple	19	6772	November 18, 2021
Apple M1, M1 pro M1 Max and Julia developpers Offtopic	17	5451	November 1, 2021
Thinking about buying a multicore system of ebay. Would appreciate any thoughts or experiences Offtopic multithreading	40	2057	January 7, 2020

Mac mini M4 pro vs AMD Ryzen 9 9950X for Linear Algebra?

Related topics