AMD GPU and CPU top 1 supercomputer, 1.1 exaflops (and other 10+ "AI exaflops")

In case people missed it, there’s a new top 1, exascale, supercomputer out, Frontier.

A.
And it’s 50% more power-efficient than anything out there. Intel and Nvidia are almost gone from the top 10 list of top500 and green500 (AMD has the top 4 computers there).

AMD can seemingly take all the credit, since both the GPUs and CPUs are AMD based. However other things factor in too, e.g. the interconnect: Slingshot 11, which I hadn’t heard about.

For FP64 you’d want to explore GPUs like Tesla V100 and A100s.

Well, I would explore AMD… and also Metal.jl. Anyone know what’s responsible for the 50%+ more efficiency (and much faster Float64)? I don’t think it can be the interconnect (unless others really hold performance back, and all are limited by the same speed of light).

Note, HP which makes the AMD-based supercomputers, bought Cray in 2019 if I recall, and it’s their interconnect (so deserve some credit too):

A high performance network for HPE Cray supercomputers and HPE HPC clusters designed for exascale era supercomputing […]
Ethernet compatible, HPE Slingshot interconnect enables straightforward execution of cloudlike and converged workloads in a high performance computing environment.

B.
It’s unclear to me if the already “existing” computer has “20 AI exaflops”, or bad reporting.

The Grace CPU Superchip features two CPU chips, connected coherently through an NVIDIA NVLink®-C2C interconnect, with up to 144 high-performance Arm V9 cores with scalable vector extensions and a 1 terabyte-per-second memory subsystem.

Strange statement: “The Swiss National Computing Center’s existing 20 AI ExaFlops Alps supercomputer will”

Taking advantage of the tight coupling between NVIDIA CPUs and GPUs, Alps will be able to train GPT-3, one of the world’s largest natural language processing models, in only two days — 7x faster than NVIDIA’s 2.8-AI exaflops Selene supercomputer, currently recognized as the world’s leading supercomputer for AI by MLPerf.

It’s unclear to me if the new ARM-based CPUs are based on AWS’s since previous announcement: “NVIDIA GPU + AWS Graviton2-Based Amazon”.

3 Likes