Hello, folks
For a while I have been playing with Flux.jl
to learn how to put together ML models, train them, and use them for predictive tasks. At the moment, I am stuck with the training stage of an LSTM model.
Problem setup
Here’s a short summary of the problem: for a long time series (20 days of data with a 15-second granularity), break the data into smaller sequences, use these sequences to “fit” an LSTM model, and then get a new set of forecasts on the go.
Below there’s a piece of code for a function that trains the LSTM model. The setup is simple: the model is trained either until the loss is less than the loss tolerance or until it reaches a maximum epoch number. The loss is defined as the RMSE.
Initially, I had the Flux.Data.DataLoader
as in line (!A!)
commented below. But the results were poor. The data being modelled are a non-stationary, highly irregular time series, and the forecasts I got were always a constant (or really close to one). But as I looked for far too long at the code, I wondered what data the model was “seeing” during training. With line (!A!)
not commented, it was my impression the training was seeing only part of the data.
As I ran Flux.Data.DataLoader
and query data
and labels
, I see a vector of size 6 (the input size) and 1, respectively. That made sense, but I don’t know how these data and labels should change as the training goes on.
Then I added line (!B!)
and saw that, no matter how many epochs it took, the value being printed was always the same. This has to mean that for as long as the model got trained, the first value of data
was the same, which means that the vector data
never got updated (nor did the 1-element vector labels). And this doesn’t seem right.
How to properly use the Flux.Data.DataLoader
? What should the variables data
and labels
be with every iteration inside the while loop?
I know that I didn’t provide a working example, but I am more than happy to clarify any point that might help you understand the problem better.
The training stage
function train_lstm(model, data_, labels_, lr, iter, max_epoch, loss_tolerance)
"""
train_lstm: train an LSTM model
- Inputs
+ model: a Flux.jl LSTM chain model
+ data: training data
+ labels: training labels
+ lr: learning rate
+ epochs: number of steps for the training stage
- Output
+ None. The training is done "in place". The already created "model" will
be updated.
"""
# Define the loss function .................................................
loss(x, y) = sqrt(mean((model(x) .- y) .^ 2))
# Define optimizers ........................................................
opt = ADAM(lr)
# Create data loader .......................................................
# (!A!) data, labels = Flux.Data.DataLoader((data_, labels_), batchsize=1, shuffle=true) |> Iterators.first
# Training the model .......................................................
# Set desired loss tolerance
loss_tol = loss_tolerance
# Initialize current loss
current_loss = Inf
# Set a max. number of iterations to prevent infinite loops
max_iterations = max_epoch
iteration = 0
while current_loss > loss_tol && iteration < max_iterations
data, labels = Flux.Data.DataLoader((data_, labels_), batchsize=1, shuffle=false) |> Iterators.first
# (!B!) println(data[1])
Flux.train!(loss, params(model), Iterators.repeated((data, labels), 1), opt)
# Calculate the current loss and iteration
current_loss = loss(data, labels)
iteration += 1
if isnan(current_loss)
break
end
end
end
Extra Information
model = Chain(
Flux.LSTM(input_size => hidden_size),
Flux.LSTM(hidden_size => aux),
Flux.Dense(aux => output_size, sigmoid)
)
return (model)
This is the model I settled with after a few iterations. I looked for some sort of traditional/disciplined way to choose and place layers in an RNN model, but didn’t find much. I didn’t want a model too complex so I “only” have three layers. The number of parameters, though, can be daunting if the input size and number of hidden nodes are large.