Hi @thomaszub!
Flux RNNs and broadcasting work in a different way than the python alternatives. In short, RNNs in julia don’t work on 3d-arrays (except when using CUDA, but this is handled at compile time I believe). Instead they work on arrays of matrices.
What this means for your example can be summarized in the following snippet:
julia> using Flux
julia> rnn = RNN(9, 1);
julia> lstm = LSTM(9, 1);
julia> x = rand(9, 128, 32);
julia> map(rnn, [view(x, :, t, :) for t ∈ 1:128])
julia> map(lstm, [view(x, :, t, :) for t ∈ 1:128])
I’m using map instead of the normal broadcast because of a bug when moving to zygote. It believe it is fixed on master, but don’t know what release it is apart of. Effectively the gradients were being truncated to only be one step back in time.