First of all, I would like to thank for the efforts put in v1.7 that supports Apple silicon CPU
Just two years or so earlier, no body understands why Apple going away from standard GPU libraries like OpenCL. Now, we should appreciate how powerful is Apple’s SoC and why it needs a new and specific Metal layer.
That said, could we expect Julia being able to leverage the full power of Apple Silicon besides the CPU, i.e. Neural Engines and GPUs? If this comes true, many linear algebra operations like matrix multiplication could see a 10x gain in performance!
If the past 10 years of improving their Axx chips tell anything, it shows that Apple would dominate and revolutionize the computing hardwares in very near future, if not already. There’re already a lot of benchmark tests showing that M1 Pro/M1 Max is way more powerful than any top-configured PCs.
It would be ready nice for Julia, being a new language that emphasizes performance, going natively with Apple silicon’s full power. That would a dream for scientific computing!
...
# AMX: Apple Matrix coprocessor
#
# This is an undocumented arm64 ISA extension present on the Apple M1. These
# instructions have been reversed from Accelerate (vImage, libBLAS, libBNNS,
# libvDSP and libLAPACK all use them), and by experimenting with their
# behaviour on the M1. Apple has not published a compiler, assembler, or
# disassembler, but by callling into the public Accelerate framework
# APIs you can get the performance benefits (fast multiplication of big
# matrices). This is separate from the Apple Neural Engine.
#
# Warning: This is a work in progress, some of this is going to be incorrect.
#
# This may actually be very similar to Intel Advanced Matrix Extension (AMX),
# making the name collision even more confusing, but it's not a bad place to
# look for some idea of what's probably going on.
...
Is reverse engineering required to take full advantage of the chips? What are the developers outside apple expected to do? Apple idea is to support a compiler?
Do you mean laptops or some special use cases, where they built special hardware support?
Because especially multi core application seem to be still by far dominated by AMD + Intel, and for single thread applications “top configured PCs” seem to be around eye level from what I can tell… E.g. my 1 year old ryzen 5800x scores 28480 points on cpubenchmarks, while the m1 pro 10core scores 23800points, and my CPU isn’t really top configuration anymore…
Still super impressive considering its a new processor and doing it at ~40% of the power consumption in a fanless laptop, but seems quite far away from is way more powerful than any top-configured PCs…
I’m not 100% up to date on special purpose benchmarks, so if you have any serious benchmarks that back up that claim (not those ominous ones from apple^^), I’d be pretty interested to see them!
It is a bit early for m1 pro/Max benchmarks but there is a large (IMO a large majority) fraction of scientific simulations that are bounded by memory bandwidth. About every large CFD, mechanics, wave propagation (all mesh based pde solvers) are in this situation. Then when a laptop is supposed to bring 400 GBs to CPU computation (20 times more than usual laptops) at a super low power I can’t help myself to think that this could be the most significant hardware step since gpgpu or multicore processor. Although I am not quite sure yet that these figures will effectively be converted in large acceleration for simulation codes: for example I wonder about the potential latency increase.
There is an undocumented matrix coprocessor in some M1 chips. Outside developers are apparently expected to use Apple’s libraries (e.g. Apple vecLib BLAS) in order to take advantage of it.
5 sec. into the video: “they smoked high-end RTX graphics Windows machines”. Really?
3:30 in:
1440P Aztec Ruins Offscreen
(FPS higher is better)
310 FPS for M1 Max vs 205 FPS for $15,000 Mac Pro Vega II
No, higher FPS (than 205) isn’t better. I’m sure it is better there, and all else equal. Regarding RTX cards however, have ray-tracing, what I would want if I cared about graphics. Graphics are not just about speed/FPS, 60 FPS should be ok assuming constant not average (and with temporal and spacial anti-aliasing, might no be there). I think people go for 100+ FPS for temporal anti-aliasing, there might be more clever ways do do it with less than 100 FPS. The old benchmarks seem meaningless. I’m sure you can find programs, related to graphics or not, where the M1 RAM limitation is a problem.
I had 3 generations of Mac laptops and during all those ~10 years the Apple cables (power supply) just gotten rotten and I had to buy new ones, as well as the power supplies themselves. Unbelievable “Apple quality” but this is 10 years of my experience with Apple.
While we are now able to provide pre-built Julia binaries for this platform, its support is currently considered [tier 3]
I believe everything should work with Rosetta, and actually most with M1 binaries:
Remember that also the x86-64 (Intel) binaries of Julia can run on these machines, thanks to the Rosetta 2 compatibility layer, albeit with a reduced performance.