For multiplication of higher-dimensional arrays it is typically easier to think in terms of tensors and tensor contractions: I would suggest looking at some of the tensor packages, e.g.:

If I understand correctly the doc (https://github.com/nonkreon/mtimesx/blob/master/doc/mtimesx_20110223.pdf), the package claims to speedup matrix multiplication, but in practice it seems to be mostly useful in the case of complex matrix multiplication. Matlab uses a representation for complex arrays that store the real and imaginary parts separately (an array of N complex numbers is stored 2 arrays of N real numbers). This does not map cleanly to BLAS routines, and the native matrix multiplication routines are apparently suboptimal, which this package is supposed to address. Julia does not have this problem, having the (much saner) memory layout of storing the real and imaginary parts together, which maps to BLAS directly, and is already optimal (provided a good BLAS is used). Thereâ€™s also some functionality for exploiting symmetry, which julia also does automatically (Aâ€™*B dispatches to a single BLAS call)

I should say that Tensors.jl provide computations with â€śtensorsâ€ť as typically defined in physics and not tensors as another word for multidimensional arrays.

As the author of StructsOfArrays, I imagine you donâ€™t think it is totally unreasonable? The example shows it could sometimes provide performance benefits / improve vectorization. Iâ€™d be interested in your thoughts there. I dug up the package as an example of why it could be useful, only to see your name on it, meaning you know way better than I do.

Juliaâ€™s missing values are also basically handled that way (a pair of arrays).

Has anyone tried something like StructsOfArrays with dual numbers? If so â€“ did that improve performance?

I think the argument with struct of arrays is that itâ€™s useful to keep close together in memory things that will be used together. If you have an array of structures, itâ€™s likely youâ€™ll want to work on the array of one component (to add it to another array or something), which requires a stride, and then you canâ€™t SIMD easily for instance. With complex numbers, how often do you need to work on the real part of an array without the imaginary one? Most operations (indeed, any holomorphic operation, which is probably why you used complex numbers in the first place) work on both the real and complex part at the same time.