Quaternion-, and up to, sedenion-valued neural networks: Parallelizing Hamilton product on GPUs/CUDA

Add info here on Sedenion networks from August 2020 paper. First what I wrote before seeing that:

have a look at octernions. It seems like they could outperform quaternions by a tiny bit.

Octonion networks are probably worth a better look. From that paper, fig. 5, yes, accuracy curve is best, while only a little better than quaterions, but both much better than for real, and complex comes close at epoch 20.

However, number of parameters is 481,150 half of quaterions, and real-valued has 7.5x as many. It seems to roughly half for each step from real->complex->quaternions->octanions.

So I thought, since there are even higher-order complex number, lets look up sedenions.

There are also other ways to squeeze networks, where the original is e.g. 50x larger, while unclear why you shouldn’t be able to get 50*7.5=375 factor from such methods plus above, and maybe an additional 2x factor with below:

https://www.researchgate.net/publication/343255151_Metacognitive_Sedenion-Valued_Neural_Network_and_Its_Learning_Algorithm

In this paper, a metacognitive sedenion-valued neural network (Mc-SVNN) and its learning algorithm are proposed. Its application to diverse time-series prediction problems is presented. The Mc-SVNN contains two components: a sedenion-valued neural network that represents the cognitive component, and a metacognitive component, which serves to self-regulate the learning algorithm. At each epoch, the metacognitive component decides what, how, and when learning occurs. The algorithm deletes unnecessary samples and stores only those that are used. This decision is determined by the sedenion magnitude and the 15 sedenion phases. The Mc-SVNN is applied to four real-world forecasting problems: USD-to-euro currency exchange rate forecasting, the sunspot number time series, power demand forecasting, and daily temperature prediction in Abu Dhabi. Compared to existing methods, the Mc-SVNN demonstrates superior performance in time-series forecasting while using a smaller number of parameters.

This is the first time I’ve heard of a real application of (let alone higher order, that I only first now hear about):

Applying the Cayley–Dickson construction to the sedenions yields a 32-dimensional algebra, sometimes called the 32-ions or trigintaduonions.[1] It is possible to apply the Cayley–Dickson construction to the sedenions arbitrarily many times.

So I had to look up trigintaduonions, which do not have their own Wikipedia page and I couldn’t find anything relating them to neural networks, there’s however:

2014 paper:
https://www.researchgate.net/publication/267097033_An_algorithm_for_multiplication_of_trigintaduonions

In this paper we introduce efficient algorithm for the multiplication of trigintaduonions. The direct multiplication of two trigintaduonions requires 1024 real multiplications and 992 real additions. We show how to compute a trigintaduonion product with 498 real multiplications and 943 real additions. During synthesis of the discussed algorithm we use a fact that trigintaduonion multipli-cation may be represented by a vector-matrix product. Such representation provides a possibility to discover repeating elements in the matrix structure and to use specific properties of their mutual placement to decrease the number of real multiplications needed to compute the product of two trigintaduonions.

2013 paper:
https://www.researchgate.net/publication/256720850_An_algorithm_for_fast_multiplication_of_sedenions

In this work a rationalized algorithm for calculating the product of sedenions is presented which reduces the number of underlying multiplications. Therefore, reducing the number of multiplications in VLSI processor design is usually a desirable task. The computation of a sedenion product using the naive method takes 256 multiplications and 240 additions, while the proposed algorithm can compute the same result in only 122 multiplications (or multipliers – in hardware implementation case) and 298 additions.

@chakravala, I thought you might know anything about this stuff, or at least be interested, and @Elrod, an idea if we can make these calculations fast in Julia (faster then competing neural network libraries can, I doubt any of them even have this implemented yet, except for naively).

A sedenion is a 16-dimensional hypercomplex
number that is obtained by applying the Cayley–Dickson
construction to octonion complex numbers. Its algebra is
non-commutative, non-associative, and non-alternative, but
power-associative

This is from memory what I remembered, you loose one more each time, here no longer alternative (I’m familiar with preceding), I guess with trigintaduonions you next lose power-associative, but I’m curious what happens for the next higher, 64-ions, that do not even have a name. What propertiy more is there to lose?

Even if these higher-order are slower, they might be worthwhile, as you pay the training cost once. And accuracy is very important. I’m not sure how this affects inference, same order of slowdown for calculations, but since your network is smaller, you may be limited by memory bandwidth thee too anyway (but less so).