M2 Ultra running Julia

PetrKryslUCSD · October 6, 2023, 2:06am

Apple M2 Ultra with 24-core CPU, 60-core GPU, 32‑core Neural Engine
192GB unified memory

How fast would Julia be on this compared with “equivalent” Intel hw?
Serial and threaded, if you have those numbers…

Thanks.

Edit: As pointed out below, I am hopeful that someone has already done some benchmarking.
But anecdotes are welcome too.

nilshg · October 6, 2023, 5:12am

How do you measure how “fast” Julia is? I world say it depends on the workload?

You could maybe run GitHub - IanButterworth/SystemBenchmark.jl: Julia package for benchmarking a system and compare it across machines.

RoyiAvital · October 6, 2023, 6:39am

CPU performance wise it won’t beat AMD Ryzen 7950x.
Though the memory sub system has more bandwidth and RAM.
So in cases the 7950x is memory bounded the M2 might not.

So usually it is better think on Apple’s CPU is the context of significant and enablers.
Mainly since their GPU memory is shared with CPU which means they can do things which require high memory (They are very attractive to those who do inference with large models).

I have the M2 Max (The top tier configuration of the Max).

LaurentPlagne · October 6, 2023, 7:47am

My experience for M1 Max :

Positive

impressive CPU bandwidth : x5 compared to x86 laptop and rivals with Xeon servers
impressive energy efficiency : compete with x86 server swith no noise and no electric plug
impressive memory for the GPU : try to buy a discrete GPU with 192 GB of RAM

Negative

Metal.jl is far less mature than CUDA.jl

Assuming a particular interest for PDEs and Finite Element Method, the main issue is probably the
state of GitHub - JuliaLinearAlgebra/AppleAccelerate.jl: Julia interface to the macOS Accelerate framework which (I can be wrong) does not allow for calling Apple Accelerate sparse solvers.

PetrKryslUCSD · October 6, 2023, 2:17pm

Precisely. My hope was that someone had already done that…

skleinbo · October 6, 2023, 2:29pm

I doesn’t seem like it currently does. I wonder how feasible it is, given that the sparse routines in Accelerator are probably(?) not in one-to-one correspondence with SuiteSparse’s

I hadn’t given it much thought until I saw this thread, but the speedup from using Accelerate over OpenBLAS can hardly be overstated! Maybe that was clear to everyone but me

Simple benchmark: matmul on dense 1000x1000 Float32 matrices yields a whooping speedup of ~4x for me (with 4 BLAS threads). Granted, I tried it on a “meager” M1 (not Pro, Max or Ultra), so it might be less pronounced on the beefier SOCs.
Still, pretty great for a drop-in replacement.
Or frustrating, because Apple is, from what I learned today, pretty tight-lipped about their “secret ingredient”, which is a FMA accelerator unit in the Silicon chips dubbed Apple Matrix Coprocessor (AMX). Apple does not disclose how to use it, and officially makes it only accessible through the Accelerate framework. It’s been reverse engineered though.

Again, might be common knowledge, but TIL!

Ronis_BR · October 6, 2023, 2:37pm

In my case, I am using M1 Max instead of the Ultra. I can say that it is beating a huge Xeon server easily when performing complex satellite simulations. In this case, we does not have much room to make parallelizations, which would be better for the Xeon (64 cores).

Given that the M2 I am using cost a fraction of the price of the Xeon server, IMHO, the M-arch computers from Apple are the best platforms to perform those kind of computations.

Just one tip when benchmarking, enable this package: GitHub - JuliaLinearAlgebra/AppleAccelerate.jl: Julia interface to the macOS Accelerate framework

PetrKryslUCSD · October 6, 2023, 2:50pm

Shouldn’t the cost comparison be apples-to-apples? (Pun intended… ;-))
The number of cores, memory, caches, … are likely quite different, aren’t they?

Ronis_BR · October 6, 2023, 4:01pm

Yes, it is! The Xeon of course is much more capable, but its performance for those simulations are way behind. I think the M1 Max completes one scenario in 40% less time. Hence, the relative cost of the M1 would still be better even if I selected a lower end Xeon which will likely have worse performance than the current one.

sylvaticus · October 7, 2023, 3:42pm

Sorry to jump into this conversation, we are currently looking for a workstation/server to run similations in a 7-10 researchers environment (julia but also a lot of R and a few python and matlab).
Why nobody names the AMD Threadripper CPUs? Looking at benchmarks it seems the best single thread (that for many custom program remains important) while not having the 128GB ram limitation of desktop cores and still a lot of cores…

PetrKryslUCSD · October 7, 2023, 3:52pm

I have just acquired a workstation with an AMD Ryzen 9 7950X 16-Core Processor, 4501 Mhz, 128 GB DDR5, cache 1, 16, 64 MB. It is very snappy. My colleague has now purchased a M2 Ultra (it hasn’t arrived yet, so I cannot run comparisons myself).

aminsadeghi · November 4, 2023, 3:02pm

One of the main differences is the max memory bandwidth. For you CPU, Google says ~73 GB/s, which is decent, but nowhere near the 800 GB/s you get with M2 Ultra. And sparse matrix multiplication, which is at the heart of solving PDEs, is memory bandwidth-limited.

Topic		Replies	Views
Apple M1, M1 pro M1 Max and Julia developpers Offtopic	17	5484	November 1, 2021
Parallel computing on M1 Max? General Usage mac-m1	6	4041	November 9, 2021
Does Mac M1 in multithreads is slower that in single thread? Performance mac-m1	10	3615	May 16, 2021
Replacing my 2013 MacBook Pro with a M1 Pro or Max - Advice? Offtopic	26	2169	January 2, 2022
Apple silicon full power Performance hardware , apple	19	6828	November 18, 2021

M2 Ultra running Julia

Related topics