Why is Flux's data input format different?


In Flux models (created using Chain), we give data array of the format D x N (D - data dimension, N - number of samples). This is is different from other ML libraries such as Tensorflow/PyTorch where we use N x D format.

This also results in the output (for a scalar prediction) being a 1 x N matrix rather than a N dimensional vector (which is more intuitive, atleast to me).

I am curious to know what is the reason for this design?


Because PyTorch and Tensorflow are written in C/C++, which has row-major layout of matrices, where Julia has column major layout of matrices. Therefore for efficiency, things are reversed.