Hello, folks

For a while I have been playing with `Flux.jl`

to learn how to put together ML models, train them, and use them for predictive tasks. At the moment, I am stuck with the training stage of an LSTM model.

## Problem setup

Here’s a short summary of the problem: for a long time series (20 days of data with a 15-second granularity), break the data into smaller sequences, use these sequences to “fit” an LSTM model, and then get a new set of forecasts on the go.

Below there’s a piece of code for a function that trains the LSTM model. The setup is simple: the model is trained either until the loss is less than the loss tolerance or until it reaches a maximum epoch number. The loss is defined as the RMSE.

Initially, I had the ` Flux.Data.DataLoader`

as in line `(!A!)`

commented below. But the results were poor. The data being modelled are a non-stationary, highly irregular time series, and the forecasts I got were always a constant (or really close to one). But as I looked for far too long at the code, I wondered what data the model was “seeing” during training. With line `(!A!)`

not commented, it was my impression the training was seeing only part of the data.

As I ran `Flux.Data.DataLoader`

and query `data`

and `labels`

, I see a vector of size 6 (the input size) and 1, respectively. That made sense, but I don’t know how these data and labels should change as the training goes on.

Then I added line `(!B!)`

and saw that, no matter how many epochs it took, the value being printed was always the same. This has to mean that for as long as the model got trained, the first value of `data`

was the same, which means that the vector `data`

never got updated (nor did the 1-element vector labels). And this doesn’t seem right.

**How to properly use the Flux.Data.DataLoader? What should the variables data and labels be with every iteration inside the while loop?**

I know that I didn’t provide a working example, but I am more than happy to clarify any point that might help you understand the problem better.

## The training stage

```
function train_lstm(model, data_, labels_, lr, iter, max_epoch, loss_tolerance)
"""
train_lstm: train an LSTM model
- Inputs
+ model: a Flux.jl LSTM chain model
+ data: training data
+ labels: training labels
+ lr: learning rate
+ epochs: number of steps for the training stage
- Output
+ None. The training is done "in place". The already created "model" will
be updated.
"""
# Define the loss function .................................................
loss(x, y) = sqrt(mean((model(x) .- y) .^ 2))
# Define optimizers ........................................................
opt = ADAM(lr)
# Create data loader .......................................................
# (!A!) data, labels = Flux.Data.DataLoader((data_, labels_), batchsize=1, shuffle=true) |> Iterators.first
# Training the model .......................................................
# Set desired loss tolerance
loss_tol = loss_tolerance
# Initialize current loss
current_loss = Inf
# Set a max. number of iterations to prevent infinite loops
max_iterations = max_epoch
iteration = 0
while current_loss > loss_tol && iteration < max_iterations
data, labels = Flux.Data.DataLoader((data_, labels_), batchsize=1, shuffle=false) |> Iterators.first
# (!B!) println(data[1])
Flux.train!(loss, params(model), Iterators.repeated((data, labels), 1), opt)
# Calculate the current loss and iteration
current_loss = loss(data, labels)
iteration += 1
if isnan(current_loss)
break
end
end
end
```

### Extra Information

```
model = Chain(
Flux.LSTM(input_size => hidden_size),
Flux.LSTM(hidden_size => aux),
Flux.Dense(aux => output_size, sigmoid)
)
return (model)
```

This is the model I settled with after a few iterations. I looked for some sort of traditional/disciplined way to choose and place layers in an RNN model, but didn’t find much. I didn’t want a model too complex so I “only” have three layers. The number of parameters, though, can be daunting if the input size and number of hidden nodes are large.