LSTM training for a sequence of multiple features using a batch size 30

I am trying to do batch training using LSTM for a time series data with multiple features.
Assuming I have 5000 samples and 5 features for each sample. The input uses 14 days into the past and the output is a single value on the 15th day. (My time step is 14). The size of my data is the following:
xtrain: (5000,14,5)
ytrain: (5000,1,1)

My model is below. How do I train my data by using a batch size of 30? I tried using DataLoader and Flux.train but they are both not working with this input size:

model = Chain(
LSTM(5, 20),
Dropout(0.5),
Dense(20, 1,σ)
)

#evaluating prediction for the input sequence
function eval_model(x)
Flux.reset!(model)
inputs = [x[1,t,:] for t in 1:14]
output = model.(inputs)
end

L(x, y) = Flux.mse(eval_model(x), y)
opt = ADAM(0.001)

#working fine until here

#creating batches and training (not working)
train_loader = DataLoader((trainX, trainY), batchsize=30, shuffle=true)
@time Flux.train!(L, params(model), trainloaded, opt)

1 Like

Have you tried looking at the error Flux raises when the dataloader is constructed?

julia> using Flux, Flux.Data

julia> X = rand(5000,14,5);

julia> Y = rand(5000,1,1);

julia> train_loader = DataLoader((X, Y), batchsize=30)
ERROR: DimensionMismatch("All data should contain same number of observations")
Stacktrace:
 [1] _nobs
   @ ~/.julia/packages/Flux/0c9kI/src/data/dataloader.jl:104 [inlined]
 [2] DataLoader(data::Tuple{Array{Float64, 3}, Array{Float64, 3}}; batchsize::Int64, shuffle::Bool, partial::Bool, rng::Random._GLOBAL_RNG)
   @ Flux.Data ~/.julia/packages/Flux/0c9kI/src/data/dataloader.jl:73
 [3] top-level scope
   @ REPL[9]:1
 [4] top-level scope
   @ ~/.julia/packages/CUDA/3VnCC/src/initialization.jl:81

Remember that Flux expects the batch dimension to be the last in any data instead of the first (what you may be familiar with from Python libraries). If you make that change:

julia> X = rand(14,5,5000);

julia> Y = rand(1,1,5000);

julia> train_loader = DataLoader((X, Y), batchsize=30);

julia> size.(first(train_loader))
((14, 5, 30), (1, 1, 30))

Now the dataloader batching works as expected.

4 Likes

Thanks a lot for your reply! I adjusted the data using the function ‘‘reshape’’ and it worked.

1 Like

hi, I run this code, but I receive this message " attempt to access 14×5×30 Array{Float64, 3} at index [1, 6, 1:30]"!, what is wrong?

How do you reshape the data

@reemmasoud123 Could you share the code how you trained and fitted your LSTM model with features. I got a similar 3D array with 1325 samples, 11 features and 3 time steps but I found very limited documentation online how to actually fit an LSTM model on a dataset with features. Additionally, I was confused as @ToucheSir mentioned that DataLoader requires an array with (time, features, batch) as input while here it uses an array with (features, batch, time) as input to fit a simple RNN.

Did you mean to link a different page? That one doesn’t mention what you wrote. Recurrence · Flux talks more about RNN input formats but not the 3D input one because (until recently) we considered that experimental.

If your dimensions are off then you’ll have to use something like permutedims to switch them to the correct order. Alternatively, pass the set of sequences to the dataloader as a vector of arrays instead of one big array. Then it’ll still split along the observation dimension into batches, and you can create the dense (features, batch, time) arrays after you get a batch out of the dataloader.

@ToucheSir The page that I referred to states: “Folding over a 3d Array of dimensions (features, batch, time) is also supported.”, which confused me.

Nevertheless, when structuring the 3D array as (time, feature, samples), as you suggested, it was straightforward to put the x and y data into a DataLoader. So that part worked fine. However, it is unclear to me how to deploy the DataLoader to train the model as I keep on getting errors regarding the dimension of the input. Perhaps it helps if I share a my code:

data_x = rand(Float32, 3, 11, 1325) # create x data as 3D array(time, feature, samples)
data_y = rand(Float32, 1, 1, 1325) # create y data
batch_size = 30
rnd_loader = DataLoader((data_x, data_y), batchsize = batch_size, shuffle = true)
size.(rnd_loader.data) # check dimensions

model = Chain(
LSTM(11, 10),
Dense(10 , 2, tanh),
Dense(2, 1)) # define the model

function eval_model(x)
Flux.reset!(model)
inputs = [x[t,:,:] for t in 1:3]
output = model.(inputs)
end # reset model after itereation and loop through the time steps

L(x, y) = Flux.mse(eval_model(x), y)
opt = ADAM(0.001)
parameters = Flux.params(model)

train!(L, parameters, rnd_loader, opt) # I get an error here

Thanks, the original link wasn’t to that section so I missed it. In general, train! does not work very well with RNNs. We’re trying to redesign the API so that it might work better in the future, but for now you’ll want to use a custom training loop instead.

2 Likes

After some playing around I found the solution. I could not find this solution anywhere else on the internet so I thought it might be helpful to share it. The issue for me was that I wanted to train an LSTM that learns the sequence of a time series and that is also trained on several features. The answers I found online either were only trained on the features or only on the sequence, but not on both.

My data consists of 3 time steps, 11 features and 1325 samples. To train the LSTM on the sequence it requires an array structured as (3, 11, 1325), however to train the model on the features, it needs an input with the features as first dimension (eg. (11, 3, 1325)). To solve this issue, I first train the LSTM on the sequence with a 3D input (3, 11, 1325) after which I reshape the LSTM output into a 2D array(11, 1325) using Flux.flatten which serves as the input for the dense layers.

I am not a 100% this is the correct way of doing it, but the predictions that I generated on my actual data made sense using this approach. Note, in the example, the training is done using only one big batch (n=1325). The batch size can be specified when defining the DataLoader().

# Generate data
data_x = rand(Float32, 3, 11, 1325)
data_y = rand(Float32, 1, 1, 1325)
rnd_loader = DataLoader((data_x, data_y), shuffle = true)

# Define model
model = Chain(
    LSTM(3, 3),
    Dense(3,1),
    Flux.flatten, #Reshape the LSTM output (1,11,1325) into (11,1325)
    Dense(11, 64, relu),
    Dense(64, 64, relu),
    Dense(64, 1))

# Define loss function
Loss(x, y) = Flux.mse((x), y)
function loss_lstm(x, y)
    Flux.reset!(model)
    yhat_reshaped = reshape(model(x), 1, :, 1) #Predict y and reshape to (1,1325,1)
    Loss(yhat_reshaped, permutedims(y[:,:,:],[2, 3, 1])) 
end
loss_lstm(data_x, data_y) #Test if loss function works

opt = ADAM(0.001) #Set optimiser learning rate

# Train model on 20 epochs
for epoch = 1:20
    @show epoch
    for d in rnd_loader
      gs = gradient(Flux.params(model)) do
        l = loss_lstm(d...)
      end
      update!(opt, Flux.params(model), gs)
    end
end

Hi, thanks for the example,

I wonder how does it really work with RNN/LSTM when your data is (time steps, features, samples)? I understand, that DataLoader creates batches based on the 3D dimension. Also, I do understand, that the Flatten works in the Chain. But I do not follow, how it works with RNN and hidden states when documentation (Built-in Layers · Flux) states that when your data is 3D array it is supposed to be (features, batch, time). Have you checked hidden states and its updates? What about computation time, when you have the 3rd dimension 1325? Just wonder about proper use of RNN/LSTM because sometimes flux seem not training well/fast enough when working with recurrent networks.

A simple example might be for RNN

# Simple model
julia> model = Flux.RNN(1,1)
Recur(
  RNNCell(1 => 1, tanh),                # 4 parameters
)         # Total: 4 trainable arrays, 4 parameters,
          # plus 1 non-trainable, 1 parameters, summarysize 248 bytes.

# Array where only one time is present
julia> x0 = reshape(1:4, 1, 2, 2)
1×2×2 reshape(::UnitRange{Int64}, 1, 2, 2) with eltype Int64:
[:, :, 1] =
 1  2

[:, :, 2] =
 3  4

julia> x1 = [x0[:,:,i] for i in axes(x0,3)]
2-element Vector{Matrix{Int64}}:
 [1 2]
 [3 4]

# run for the first time step
julia> model(x1[1])
1×2 Matrix{Float32}:
 -0.547279  -0.842282

julia> model.state # state is same as the output
1×2 Matrix{Float32}:
 -0.547279  -0.842282

# run for the second step
julia> model(x1[2])
1×2 Matrix{Float32}:
 -0.938914  -0.979353

julia> model.state
1×2 Matrix{Float32}:
 -0.938914  -0.979353

# use the original 3D array, run on both types
julia> Flux.reset!(model)
1×1 Matrix{Float32}:
 0.0

julia> model(x0)
1×2×2 Array{Float32, 3}:
[:, :, 1] =
 -0.547279  -0.842282

[:, :, 2] =
 -0.938914  -0.979353

julia> model.state # state is same as above, here RNN is run over 2 time steps
1×2 Matrix{Float32}:
 -0.938914  -0.979353

It is the same, as in documentation, and repeated at the forum many times. I just wonder how it works for you with the time dimensions and time series.