Dimensions of minibatch

I’m sort of struggling with the same issue as here , but the answer isn’t really clear to me.

What I’m trying to do is to make a basic timeseries forecasting with a RNN architecture with minibatches.

I got the minibatches working for a Dense model, but the RNN seems to work different.

As data, I have sequences of length S, and I have N observations. For a simple dense model (Chain(Dense(S, 10), Dense(10, 1)) I need to structure my data as a SxN Array. After that, I used Flux.Data.DataLoader to create batches.

So far, this works. But now the RNN: simply changing the model to Chain(RNN(S,10), Dense(10,1)) does not work as expected.
Well, I can create a model m and feed it m(train_x) with train_x having SxN dimensions. That returns an 1xN output.

However, running this again on a subset with dimensions SxM with M a different dimension as N, returns a DimensionMismatch error.

What I think is going on, is that the RNN needs a specification for the amount of features (which is 1 in my case) and is assuming that I have 210 features. So, this suggests that to use batches, I would need to add an extra dimension for the features. This is similar to what I have seen in tensorflow.

I have also looked at the char-rnn.jl example from the Flux model-zoo, but I just don’t get it to work. Tried different dimensions, reshaping, etc etc. I was hoping someone could either explain what is going on there, or what the dimensions ought be.

I am struggling with the same problem. I found the model actually changed automatically after training with minibatches. For example, the size of states in the model RNN(S,10) is Sx1 before training, but will become SxN after training with a SxN array. The output from model(a) (a is a Sx1 array) will become a 10xN array. I do not understand why…

Hi,

Flux does recurrent layers a bit differently compared to other popular frameworks. I think it is pretty well explained in the docs, but the dot is easy to miss and if one is used to how other frameworks do it this is a new take.

Short story is that you need to feed the recurrent layers a sequence of examples. If you for example have S features in T timesteps and want to use a batch size of N then your N examples could be arranged as an array of arrays [[SxN], [SxN],...] where element t in the outer array is the SxN examples for sequence step t.

You can of course replace this with an interator or any other means of producing the sequence in this format. The point is that the flux RNN layers are stateful, meaning that they remember what you called them with the last time and will assume that what you call them with now is the next step in the sequence. To “end” the sequence, you call `Flux.reset!(model).

That is why calling it with an SxM input will fail as Flux then thinks you provided a different number of examples for the next timestep. Note how other frameworks enforce this by having 3D arrays as input to recurrent layers as it is not possible to create a 3D array where this happens. They also typically hide the statefulness by implicitly reseting after a pass.

I guess this can serve as an example as to why the programming world does not like to deal with state :slight_smile:

1 Like

Thanks, I think this clears it up enough for me to continue!