Apple M4 Max AMX Linear Algebra performance versus CPU and GPU

PetarM · September 21, 2025, 1:41pm

Hello everybody!

I made a video investigation of Apple’s AMX accelerator Linear Algebra performance in Julia: https://www.youtube.com/watch?v=TjfA9LVgHXk

According to my findings, the 2 AMX cores achieve almost 3 times the peak performance of the 12 P-cores in FP32 and close to same performance as the Ryzen 9950X in dense matrix-matrix multiplication. At the same time they are 10 times more power efficient than the P-cores and about 6 times more power efficient than the Zen 5 cores.

Somewhat disappointingly, the AMX cores have the same throughput in FP16 as in FP32, even though according to Apple’s patents it seemed reasonable to expect a 4-fold increase

In FP64, the AMX core throughput drops to 1/4 that of FP32, as expected. However, this is still superior to the 12 P-core performance.

The other problem I examined was matrix-vector multiplication and there the performance was memory bound and roughly matched that of the P-cores.

When comparing against the GPU, it becomes quite apparent why Apple introduced the AMX: it uplifts performance exactly where the GPU is weak: small problem sizes and it does so with significantly higher power efficiency.

Curiously, I achieved better peak GPU performance in Julia than using Apple’s MLX! Kudos to everyone who made it possible to so easily leverage the GPU in Julia.

rveltz · September 21, 2025, 9:32pm

Very nice!

Is the code available?

Norman · September 22, 2025, 12:17am

Great video! Just bumped into your video on Youtube and then saw the post here.

Topic		Replies	Views
How much faster is GPU compare to CPU GPU	16	27041	November 24, 2018
Apple M1 GPU from Julia? GPU question	20	5959	March 31, 2023
Is Apple accelerate on Mac mini 4 mature enough for numerical simulation? Offtopic	4	454	March 14, 2025
Going beyond a single core execution with FluxML (no CUDA) General Usage	3	497	January 24, 2019
M2 Ultra running Julia Offtopic	11	1803	November 4, 2023

Apple M4 Max AMX Linear Algebra performance versus CPU and GPU

Related topics