After some playing around I found the solution. I could not find this solution anywhere else on the internet so I thought it might be helpful to share it. The issue for me was that I wanted to train an LSTM that learns the sequence of a time series and that is also trained on several features. The answers I found online either were only trained on the features or only on the sequence, but not on both.
My data consists of 3 time steps, 11 features and 1325 samples. To train the LSTM on the sequence it requires an array structured as (3, 11, 1325), however to train the model on the features, it needs an input with the features as first dimension (eg. (11, 3, 1325)). To solve this issue, I first train the LSTM on the sequence with a 3D input (3, 11, 1325) after which I reshape the LSTM output into a 2D array(11, 1325) using Flux.flatten which serves as the input for the dense layers.
I am not a 100% this is the correct way of doing it, but the predictions that I generated on my actual data made sense using this approach. Note, in the example, the training is done using only one big batch (n=1325). The batch size can be specified when defining the DataLoader().
# Generate data
data_x = rand(Float32, 3, 11, 1325)
data_y = rand(Float32, 1, 1, 1325)
rnd_loader = DataLoader((data_x, data_y), shuffle = true)
# Define model
model = Chain(
LSTM(3, 3),
Dense(3,1),
Flux.flatten, #Reshape the LSTM output (1,11,1325) into (11,1325)
Dense(11, 64, relu),
Dense(64, 64, relu),
Dense(64, 1))
# Define loss function
Loss(x, y) = Flux.mse((x), y)
function loss_lstm(x, y)
Flux.reset!(model)
yhat_reshaped = reshape(model(x), 1, :, 1) #Predict y and reshape to (1,1325,1)
Loss(yhat_reshaped, permutedims(y[:,:,:],[2, 3, 1]))
end
loss_lstm(data_x, data_y) #Test if loss function works
opt = ADAM(0.001) #Set optimiser learning rate
# Train model on 20 epochs
for epoch = 1:20
@show epoch
for d in rnd_loader
gs = gradient(Flux.params(model)) do
l = loss_lstm(d...)
end
update!(opt, Flux.params(model), gs)
end
end