[Edit: See my next comment on August 2020 Sedenion-networks, and there and here a bit on Octonion-networks. I changed the title yet again replacing Octonion with Sedenion as to not make it too long.]
The specific four dimensional algebra of quaternion numbers, including the Hamilton product allows quaternion-valued models to consistently outperform equivalent real-valued neural networks.
Last time I checked complex-valued ANNs weren’t mainstream (people I asked didn’t even know of), and I first now read about quaternion-valued, neither brand-new, in fact very old idea, and it seems it could be a killer application for Julia. I think Julia would be ideal to explore this gap:
The Hamilton product is a powerful but expensive operation. In current implementations, this operation is not fully parallelized on GPUs, implying a longer training time for quaternion neural networks. A proper CUDA implementation of this product would drastically reduce this computation time and makes of QNNs a mandatory alternative to real-valued NNs.
All-else-equal, it seems quternion-valued would be better, but everything else isn’t equal, architectures are quckly evolivng with e.g. transformer networks (not mentioned in the survey article), and capsule-networks, it may not help only extending outdated convolutional to QCNN:
New architectures Despite a recent QCNN and QRNN, new neural networks architectures are still missing. For example, capsule networks, or generative adversarial neural networks could benefits from the introduction of quaternion numbers.
Then, while bigger neural networks allow better performances, quaternion neural networks make it possible to obtain comparable or better results on the same task, but with four times less neural model parameters. Indeed, a 4-number quaternion weight linking two 4-number quaternion units only has four degrees of freedom, whereas a standard neural net parametrization has 4×4=16, i.e., a fourfold saving in memory. Therefore, the natural multidimensional representation of quaternions alongside with their ability to drastically reduce the number of parameters indicate that hyper-complex numbers are a better fit than real numbers to create more efficient models in multidimensional spaces.
New learning algorithms Real world applications of current QNN architectures are based on the straighforward extension of the real-valued backpropagation to the quaternion domain. The recent GHR calculus makes it possible to propose well-adapted learning algorithms that can speed-up the training, and increase the performances due to a better consideration of the quaternion algebra. Such learning algorithms must be improved and deployed in state-of-the-art QNN architectures to fully expose the potential of the GHR calculus.
New data preprocessing methods have to be investigated to naturally project the features into the quaternion space, such as the quaternion Fourier transform (Hitzer 2007).
[…] it seems like complex-valued neural networks can outperform real-valued NNs. [that link is to interesting papers and code]
[list of papers]
There is more …
Well, if quaternions are still to simple, then have a look at octernions. It seems like they could outperform quaternions by a tiny bit. And if that is to complicated for you, then drop back to complex numbers
Three papers from last year that seem interesting:
Deep Octonion Networks
Quaternion-valued multi-layer per-
ceptrons (QMLP), and autoencoders (QAE) have been intro-
duced to capture such latent dependencies, alongside to rep-
resent multidimensional data. Nonetheless, a three-layered
neural network does not benefit from the high abstraction
capability of DNNs. The paper proposes first to extend the
hyper-complex algebra to deep neural networks (QDNN) and,
then, introduces pre-trained deep quaternion neural networks
(QDNN-AE) with dedicated quaternion encoder-decoders
(QAE). The experiments conduced on a theme identification
task of spoken dialogues from the DECODA data set show,
inter alia, that the QDNN-AE reaches a promising gain of
2.2% compared to the standard real-valued DNN-AE.
Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information.