Suppose I have a recurrent model in Flux, say
using Flux D = 2 m = RNN(D,3)
If want to apply it to sequence, then the documentation says I should represent my sequence
seq as a vector of vectors, so something like
T = 5 seq = [rand(D) for _ in 1:T] y = m.(seq)
Now, how should I batch my sequences so that I can take advantage of GPU during training?
I’m guessing, if I have a batch of
N sequences each of length
T then I should represent them as a vector of
D×N-matrices? So something like
N = 7 seq_batch = [rand(D,N) for _ in 1:T] y_batch = m.(seq_batch)
My understanding is that this will create
N hidden state vectors each independent of each other (and then doing
reset!(m) will reset each one of them), which is what I’m looking for. Is my understanding correct?