How to handle dataloader consisting of data with different dimensions?

Hi, I was trying to use Flux.DataLoader with following code:

X1 = rand(85761, 8, 24); X2 = rand(85761, 8, 1); y_1 = rand(85761, 8, 1); emb = rand(85761, 1)
train_loader = Flux.DataLoader((X1=X1, emb=emb, X2=X2, y_1=y_1, ), batchsize=500, shuffle=true);

but I got following warning:

DimensionMismatch("All data inputs should have the same number of observations, i.e. size in the last dimension. But data[:X1] (85761×8×24 Array{Float64, 3}) has 24, while data[:emb] (85761×1 Matrix{Float64}) has 1.")
I wonder why the number of data points is determined by last dimension instead of first dimension? And how should I deal with my data in this case (swap dimension 1 and 3, or rotate dimension (1,2,3) → (2, 3,1)?

Thank you!

Afaik, this has to do with julia being column major instead of row major.

As for how to handle, it depends on what you want to feed the data into. The docs for each layertype in flux should describe what they assume about the data.

What does “column major” mean here? Does it mean “number of data points” should be the last dimension rather than first dimension?

“column major” means that the first dimension during indexing iterates over columns of multidimensional data first. As for Flux.DataLoader, it was decided that since the first thing to be iterated over is a column, it makes sense to have batches be stored in columns, since that way the thing you iterate first over is a single batch.

1 Like