I was wondering what is the best practice to use a LSTM (or any RNN) together with some convolutional layers using Flux and in general how to unite recurrent networks with other blocks.
Say I have this simple problem where I want to fit an LSTM to some random data:
using Flux Nt = 100 # time steps Nin,Nout = 5,3 # input size, output size Nh = 28 # hidden dim lstm = Chain(LSTM(Nin,Nh),Dense(Nh,Nout)) # simple lstm # generate some fake data X,Y = [randn(Float32,Nin,Nt) for i=1:10],[randn(Float32,Nout,Nt) for i=1:10] data = Flux.Data.DataLoader(X, Y, batchsize=2) # loss uses broadcasting loss(x, y) = sum(Flux.Losses.mse.(lstm.(x), y)) ps = Flux.params(lstm) Flux.train!(loss,ps,data,ADAM())
From what I’ve understand this is the best practice when using sequences, namely you should use use broadcast when plugging in the data to the model. Please let me know if this is not the case.
I want to get a better model and apply some 1d convolution before feeding to the LSTM.
I’ve managed to do it like this:
cnn_lstm = Chain( y -> Flux.unsqueeze(y',3), # Conv needs a tensor where last dim is batch size # need to transpose the matrix as well Conv((3,),Nin=>Nh,pad=1), y -> y[:,:]', # remove last dim LSTM(Nh,Nh), Dense(Nh,Nout)) loss(x, y) = sum(Flux.Losses.mse.(cnn.(x), y)) Flux.train!(loss,ps,data,ADAM())
However I’d imagine this can be done in a much better/efficient way?
In particular I suppose feeding
Conv with a tensor of size
(Nt,Nin,num_batches) is recommended.
However I would then need to reshape it to an array of arrays to feed it to the LSTM?
Plus LSTM do not seem to accept 3d arrays…