It works! Thank you very much!
Here is a simple example. Do you know how to plot loss?
using Flux
Nt = 100 # time steps
Nin,Nout = 5,3 # input size, output size
Nh = 28 # hidden dim
lstm = Chain(LSTM(Nin,Nh),Dense(Nh,Nout)) # simple lstm
# generate some fake data
X,Y = [randn(Float32,Nin,Nt) for i=1:10],[randn(Float32,Nout,Nt) for i=1:10]
data = Flux.Data.DataLoader((X, Y), batchsize=2)
# loss uses broadcasting
loss(x, y) = sum(Flux.Losses.mse.(lstm.(x), y))
ps = Flux.params(lstm)
Flux.train!(loss,ps,data,ADAM())