Batch time-series input for RNN

How to form and feed a proper batch to feed to RNN?
It is possible to apply RNN to batch of one data point time-series inputs:

# single sample
rnn = RNN(2,3)
seq = ones(Float32,(2,10)) 
rnn(seq) # apply to 10 sample independently

# Output:
3×10 Array{Float32,2}:
 -0.342626  -0.342626  -0.342626  …  -0.342626  -0.342626  -0.342626
 -0.56927   -0.56927   -0.56927      -0.56927   -0.56927   -0.56927
 -0.688904  -0.688904  -0.688904     -0.688904  -0.688904  -0.688904 

Also I know how to apply it to single sequence of data

# Time series of length 5
seq = [rand(Float32,2) for i=1:5]
rnn.(seq) # apply to single sequence 
Output:
5-element Array{Array{Float32,1},1}:
 [-0.7071419, 0.7136748, 0.478005]
 [-0.697015, 0.9357477, -0.79483825]
 [-0.77859926, 0.88962847, 0.55188143]
 [-0.7391649, 0.9885068, -0.6310922]
 [-0.76783645, 0.89206576, 0.26163444]

There is a way to get the batch of time series:

seq = [rand(Float32,(2,10)) for i=1:5]
rnn.(seq) # apply to 10 sequences independently

#Output: 

5-element Array{Array{Float32,2},1}:
 [-0.88525283 -0.69476825 … -0.89752537 -0.89838624; 0.9713873 0.9787338 … 0.9860863 0.9857219; -0.7818201 -0.367579 … -0.6444337 -0.6539876]
 [-0.91649914 -0.6018399 … -0.8816908 -0.88034606; 0.78649575 0.9152232 … 0.90690774 0.8112091; -0.250717 0.18231556 … 0.053293146 -0.27068266]
 [-0.8498657 -0.8136713 … -0.9134936 -0.91809046; 0.8846819 0.9200575 … 0.9767982 0.9774641; -0.37103912 -0.83741844 … -0.5929827 0.082197964]
 [-0.69060314 -0.8576724 … -0.7624638 -0.77607745; 0.95665026 0.75602245 … 0.95435596 0.9842143; 0.4997346 0.11458533 … 0.5946678 -0.15885094]
 [-0.43952018 -0.8901464 … -0.7802511 -0.51005286; 0.9877861 0.98162913 … 0.9833198 0.8864663; -0.32135123 -0.3325736 … -0.8334724 -0.22399448]

But for my problem I want to use Flux.Data.Dataloader for a dataset with numerous time-series data.
My input has 3 dimensions, and the last one should be the batch_size in order to apply Dataloader. In contrast [rand(Float32,(2,10)) for i=1:5] has size (5,).
The question is how to construct a dataset that the Dataloader and RNN can be applied to some batch?

1 Like

AFAIK Flux’s DataLoader does not do any collation for non-array inputs, so you’d have to handle that yourself before passing the data off to the RNN. If you don’t mind the extra dependency, GitHub - lorenzoh/DataLoaders.jl: A parallel iterator for large machine learning datasets that don't fit into memory inspired by PyTorch's `DataLoader` class. does handle this in a fashion similar to what you’d expect from PyTorch’s DataLoader.

1 Like