Suppose I have a recurrent model in Flux, say
using Flux
D = 2
m = RNN(D,3)
If want to apply it to sequence, then the documentation says I should represent my sequence seq
as a vector of vectors, so something like
T = 5
seq = [rand(D) for _ in 1:T]
y = m.(seq)
Now, how should I batch my sequences so that I can take advantage of GPU during training?
I’m guessing, if I have a batch of N
sequences each of length T
then I should represent them as a vector of D×N
-matrices? So something like
N = 7
seq_batch = [rand(D,N) for _ in 1:T]
y_batch = m.(seq_batch)
My understanding is that this will create N
hidden state vectors each independent of each other (and then doing reset!(m)
will reset each one of them), which is what I’m looking for. Is my understanding correct?