Suppose I have a recurrent model in Flux, say

```
using Flux
D = 2
m = RNN(D,3)
```

If want to apply it to sequence, then the documentation says I should represent my sequence `seq`

as a vector of vectors, so something like

```
T = 5
seq = [rand(D) for _ in 1:T]
y = m.(seq)
```

Now, how should I batch my sequences so that I can take advantage of GPU during training?

I’m guessing, if I have a batch of `N`

sequences each of length `T`

then I should represent them as a vector of `D×N`

-matrices? So something like

```
N = 7
seq_batch = [rand(D,N) for _ in 1:T]
y_batch = m.(seq_batch)
```

My understanding is that this will create `N`

hidden state vectors each independent of each other (and then doing `reset!(m)`

will reset each one of them), which is what I’m looking for. Is my understanding correct?