From Flux document https://fluxml.ai/Flux.jl/v0.10/models/layers/#Convolution-and-Pooling-Layers-1, the dimension order for input data is WHCN (width, height, # channels, # batches).
I am a bit confused regarding why this ordering is used.
- When an image is loaded, the natural dimension is (height, width). Hence if we want to feed the data into the Convolution layer, we need to do a transpose, which seems to be unnecessary.
- The example in the model (https://github.com/FluxML/model-zoo/blob/master/vision/cifar10/cifar10.jl) doesn’t actually follow this document. Instead of WHCN, HWCN order is used. Although we could argue, for the images with width=height, the ordering probably doesn’t make any difference.
- For comparison, Pytorch use the NCWH order.
I think that in commonplace image formats (e.g. PNG, BMP) the image data are packed in rows. This means that when you scan the file, you do it left-to-right, row by row. On the other hand, the data of arrays in Julia are ordered column-wise.
So, if you want to fill an array with the data of an image in the same order as it is scanned from the file, the dimension of the array should be W×H×C. If you are filling a 4-dimensional array with data from N files, scanned one after another, you do it in an array with size W×H×C×N.
Thanks. I am a bit more clear now. But still, when we load an image, the data will always be presented as a height x width format. I guess it is just unfortunate that vectors are column-oriented, but images are naturally row-oriented.
Silly me! In image files, channels are stored together for each pixel, so the real order of data as scanned from the files should be CWH, not WHC as I had said.