@Elrod I believe you and others (OpenBLAS, MKL etc) use the Strassen algorithm, which has now been *practically* improved on.

TL;DR, multiplying 10-20% faster is considered a breakthrough, now achieved:

we adapted AlphaTensor to specifically find algorithms that are fast on a given hardware, such as Nvidia V100 GPU, and Google TPU v2. These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware, which showcases AlphaTensor’s flexibility in optimising arbitrary objectives.

This means that OpenBLAS, MKL, and your (old and new) code is outdated, even for the 4x5 by 5x5 matrix multiply where I calculate 76/80 = 5% improvement, and 4% for 4x4. Also should some static array size code be special-cased, e.g. for those sizes?

[…] Our system, AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and shogi, and this work shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first time.

## Matrix multiplication

Matrix multiplication is one of the simplest operations in algebra, commonly taught in high school maths classes. […]

Despite decades of research following Strassen’s breakthrough, larger versions of this problem have remained unsolved – to the extent that it’s not known how efficiently it’s possible to multiply two matrices that are as small as 3x3. […]

In our paper, we explored how modern AI techniques could advance the automatic discovery of new matrix multiplication algorithms. Building on the progress of human intuition, AlphaTensor discovered algorithms that are more efficient than the state of the art for many matrix sizes. Our AI-designed algorithms outperform human-designed ones, which is a major step forward in the field of algorithmic discovery.

## The process and progress of automating algorithmic discovery

First, we converted the problem of finding efficient algorithms for matrix multiplication into a single-player game. In this game, the board is a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is.

[…]

For example, if the traditional algorithm taught in school multiplies a 4x5 by 5x5 matrix using 100 multiplications, and this number was reduced to 80 with human ingenuity, AlphaTensor has found algorithms that do the same operation using just 76 multiplications.Beyond this example, AlphaTensor’s algorithm improves on Strassen’s two-level algorithm in a finite field for the first time since its discovery 50 years ago. These algorithms for multiplying small matrices can be used as primitives to multiply much larger matrices of arbitrary size.

[…]

Leveraging this diversity, we adapted AlphaTensor to specifically find algorithms that are fast on a given hardware, such as Nvidia V100 GPU, and Google TPU v2. These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware, which showcases AlphaTensor’s flexibility in optimising arbitrary objectives.## Exploring the impact on future research and applications

From a mathematical standpoint, our results can guide further research in complexity theory, which aims to determine the fastest algorithms for solving computational problems. By exploring the space of possible algorithms in a more effective way than previous approaches, AlphaTensor helps advance our understanding of the richness of matrix multiplication algorithms. Understanding this space may unlock new results for helping determine the asymptotic complexity of matrix multiplication, one of the most fundamental open problems in computer science.

Because matrix multiplication is a core component in many computational tasks, spanning computer graphics, digital communications, neural network training, and scientific computing, AlphaTensor-discovered algorithms could make computations in these fields significantly more efficient. AlphaTensor’s flexibility to consider any kind of objective could also spur new applications for designing algorithms that optimise metrics such as energy usage and numerical stability, helping prevent small rounding errors from snowballing as an algorithm works.

While we focused here on the particular problem of matrix multiplication, we hope that our paper will inspire others in using AI to guide algorithmic discovery for other fundamental computational tasks. Our research also shows that AlphaZero is a powerful algorithm that can be extended well beyond the domain of traditional games to help solve open problems in mathematics. Building upon our research, we hope to spur on a greater body of work – applying AI to help society solve some of the most important challenges in mathematics and across the sciences.

AlphaTensor is not mentioned at:

Since the bound on omega 2.8074 of Strassen has already been improved, even to 2.3728596 in 2020, but none of the newer algorithms have been practical, because of a large constant factor.

Freivalds’ algorithm, a simple Monte Carlo algorithm that, given matrices A, B and C, verifies in Θ(

n²) time ifAB=C.

That’s only to verify (is it commonly used?), and omega of 2 is the lower-bound to multiply.

Here we report a deep reinforcement learning approach based on AlphaZero for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. Our agent, AlphaTensor, is trained to play a single-player game where the objective is finding tensor decompositions within a finite factor space. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor’s algorithm improves on Strassen’s two-level algorithm for the first time, to our knowledge, since its discovery 50 years ago

[…]

see Fig. 1a for how to represent the 2 × 2 matrix multiplication operation as a 3D tensor of size 4 × 4 × 4

[…]

AlphaTensor finds an algorithm for multiplying 4 × 4 matrices using 47 multiplications in Z2, thereby outperforming Strassen’s two-level algorithm, which involves 7² = 49 multiplications. By applying this algorithm recursively, one obtains a practical matrix multiplication algorithm in Z2 with complexity O(N^2.778)

There are 4 independent directories:

- algorithms contains algorithms discovered by AlphaTensor, represented as factorizations of matrix multiplication tensors, and a Colab showing how to load these.
- benchmarking contains a script that can be used to measure the actual speed of matrix multiplication algorithms on an NVIDIA V100 GPU.
- nonequivalence contains 14,236 nonequivalent algorithms discovered by AlphaTensor for the same matrix multiplication problem (multiplying 4x4 matrices), and a Colab that verifies their nonequivalence.
- recombination contains the code we used to decompose larger matrix multiplication tensors by recombining factorizations of smaller ones.