Apple silicon full power

First of all, I would like to thank for the efforts put in v1.7 that supports Apple silicon CPU :100: :bowing_man:

Just two years or so earlier, no body understands why Apple going away from standard GPU libraries like OpenCL. Now, we should appreciate how powerful is Apple’s SoC and why it needs a new and specific Metal layer.

That said, could we expect Julia being able to leverage the full power of Apple Silicon besides the CPU, i.e. Neural Engines and GPUs? If this comes true, many linear algebra operations like matrix multiplication could see a 10x gain in performance!

If the past 10 years of improving their Axx chips tell anything, it shows that Apple would dominate and revolutionize the computing hardwares in very near future, if not already. There’re already a lot of benchmark tests showing that M1 Pro/M1 Max is way more powerful than any top-configured PCs.

It would be ready nice for Julia, being a new language that emphasizes performance, going natively with Apple silicon’s full power. That would a dream for scientific computing!

9 Likes

not if cluster/super computers don’t start using apple silicon

2 Likes

IMHO:
The first step has been started: Reverse Engineering

Apple Matrix coprocessor - Reverse Engineering

...
# AMX: Apple Matrix coprocessor
#
# This is an undocumented arm64 ISA extension present on the Apple M1. These
# instructions have been reversed from Accelerate (vImage, libBLAS, libBNNS,
# libvDSP and libLAPACK all use them), and by experimenting with their
# behaviour on the M1. Apple has not published a compiler, assembler, or
# disassembler, but by callling into the public Accelerate framework
# APIs you can get the performance benefits (fast multiplication of big
# matrices). This is separate from the Apple Neural Engine.
#
# Warning: This is a work in progress, some of this is going to be incorrect.
#
# This may actually be very similar to Intel Advanced Matrix Extension (AMX),
# making the name collision even more confusing, but it's not a bad place to
# look for some idea of what's probably going on.
...

Apple M1 Neural Engine - Reverse Engineering

Apple M1 GPU - Reverse Engineering

And as usual - adding “reverse engineering” for the keywords … you can check the latest status

Related thread:

4 Likes

Is reverse engineering required to take full advantage of the chips? What are the developers outside apple expected to do? Apple idea is to support a compiler?

It’s already a thing, it’s called Apple Clang

2 Likes

Do you mean laptops or some special use cases, where they built special hardware support?

Because especially multi core application seem to be still by far dominated by AMD + Intel, and for single thread applications “top configured PCs” seem to be around eye level from what I can tell… E.g. my 1 year old ryzen 5800x scores 28480 points on cpubenchmarks, while the m1 pro 10core scores 23800points, and my CPU isn’t really top configuration anymore…

Still super impressive considering its a new processor and doing it at ~40% of the power consumption in a fanless laptop, but seems quite far away from is way more powerful than any top-configured PCs…

I’m not 100% up to date on special purpose benchmarks, so if you have any serious benchmarks that back up that claim (not those ominous ones from apple^^), I’d be pretty interested to see them! :slight_smile:

4 Likes

using only the high-level private API <> “take full advantage”

for the low-level assembly tunning: is a good documentation required

1 Like

It is a bit early for m1 pro/Max benchmarks but there is a large (IMO a large majority) fraction of scientific simulations that are bounded by memory bandwidth. About every large CFD, mechanics, wave propagation (all mesh based pde solvers) are in this situation. Then when a laptop is supposed to bring 400 GBs to CPU computation (20 times more than usual laptops) at a super low power I can’t help myself to think that this could be the most significant hardware step since gpgpu or multicore processor. Although I am not quite sure yet that these figures will effectively be converted in large acceleration for simulation codes: for example I wonder about the potential latency increase.

8 Likes

There is an undocumented matrix coprocessor in some M1 chips. Outside developers are apparently expected to use Apple’s libraries (e.g. Apple vecLib BLAS) in order to take advantage of it.

5 Likes

tests like this are flooding the web

1 Like

I don’t think we need to reverse engineering the chips. Instead it’s very good enough for Julia being able to call Metal and Accelerate.

2 Likes

and clearly that’s a laptop. worse, it’s apple’s own laptop which had serious thermal throttling issue

Julia should be able to use the Apple libraries, right? (Whether someone spends the effort to enable that is of course a relevant question.)

2 Likes

I have a new Macbook Pro in the mail, and would really prefer to continue building ML / AI apps in Julia, so I hope this will come.

and this one is massive

5 sec. into the video: “they smoked high-end RTX graphics Windows machines”. Really?

3:30 in:

1440P Aztec Ruins Offscreen
(FPS higher is better)

310 FPS for M1 Max vs 205 FPS for $15,000 Mac Pro Vega II

No, higher FPS (than 205) isn’t better. I’m sure it is better there, and all else equal. Regarding RTX cards however, have ray-tracing, what I would want if I cared about graphics. Graphics are not just about speed/FPS, 60 FPS should be ok assuming constant not average (and with temporal and spacial anti-aliasing, might no be there). I think people go for 100+ FPS for temporal anti-aliasing, there might be more clever ways do do it with less than 100 FPS. The old benchmarks seem meaningless. I’m sure you can find programs, related to graphics or not, where the M1 RAM limitation is a problem.

And will those new Macs still rotten the cables once per year :smiling_imp:

Can someone explain what this sentence below means?
And will those new Macs still rotten the cables once per year

I tried to read it ten times and I am still confused. English is not my first language.

OK, sorry. It was just a silly joke.

I had 3 generations of Mac laptops and during all those ~10 years the Apple cables (power supply) just gotten rotten and I had to buy new ones, as well as the power supplies themselves. Unbelievable “Apple quality” but this is 10 years of my experience with Apple.

FYI: I’m jumping the gun a bit, but note still not tier 1 (for 1.7 RCs or 1.8 master): blog post: Julia 1.7 Highlights by KristofferC · Pull Request #1419 · JuliaLang/www.julialang.org · GitHub

While we are now able to provide pre-built Julia binaries for this platform, its support is currently considered [tier 3]

I believe everything should work with Rosetta, and actually most with M1 binaries:

Remember that also the x86-64 (Intel) binaries of Julia can run on these machines, thanks to the Rosetta 2 compatibility layer, albeit with a reduced performance.

There are still open issues for M1:

2 Likes