Tensor dimension order on convolution layer

From Flux document https://fluxml.ai/Flux.jl/v0.10/models/layers/#Convolution-and-Pooling-Layers-1, the dimension order for input data is WHCN (width, height, # channels, # batches).

I am a bit confused regarding why this ordering is used.

  1. When an image is loaded, the natural dimension is (height, width). Hence if we want to feed the data into the Convolution layer, we need to do a transpose, which seems to be unnecessary.
  2. The example in the model (https://github.com/FluxML/model-zoo/blob/master/vision/cifar10/cifar10.jl) doesn’t actually follow this document. Instead of WHCN, HWCN order is used. Although we could argue, for the images with width=height, the ordering probably doesn’t make any difference.
  3. For comparison, Pytorch use the NCWH order.

I think that in commonplace image formats (e.g. PNG, BMP) the image data are packed in rows. This means that when you scan the file, you do it left-to-right, row by row. On the other hand, the data of arrays in Julia are ordered column-wise.

So, if you want to fill an array with the data of an image in the same order as it is scanned from the file, the dimension of the array should be W×H×C. If you are filling a 4-dimensional array with data from N files, scanned one after another, you do it in an array with size W×H×C×N.

Thanks. I am a bit more clear now. But still, when we load an image, the data will always be presented as a height x width format. I guess it is just unfortunate that vectors are column-oriented, but images are naturally row-oriented.

Silly me! In image files, channels are stored together for each pixel, so the real order of data as scanned from the files should be CWH, not WHC as I had said.