How to arrange data for time series forecasting (mini-batching) without violoating the GPU memory for a LSTM?

You’re feeding the model with a HUGE batch size of 82100. You likely want to break your data into batches of small size. Training a model with a single batch will likely be laborious, which is the point of stochastic gradient descent.

The data argument to Flux.train! is an an iterable, whose length = number of batches. In the above example, [data] is of length one, hence one epoch is made out of a single batch.

Consider partitioning the X and Y into chunks of a more reasonable batch size such as 64, 128 (you can think of how the 50K MNIST data is split into small batches).

For example, the following would represent a iterable input for training on a dataset made of 1000 batches of length 128 (that is where the total dataset comprises 128 000 observations, each with 50 steps):

seq_len = 50
batch_size = 128
num_batches = 1000
no_features = 6

X_tr = [[rand(Float32, no_features, batch_size) for i in 1:seq_len] for b in 1:num_batches]
Y_tr = [rand(Float32, batch_size, seq_len) for b in 1:num_batches]
data = zip(X_tr, Y_tr);

for d in data
  println(size(d[1]), size(d[2]))
end
(50,)(128, 50)
...
(50,)(128, 50)
5 Likes