Problem with GRU and LSTM Layers in Flux

LucaGeminiani00 · February 11, 2024, 11:14am

I have built a Time series forecasting model in Flux.jl including an LSTM layer (the problem is the same also when I include a GRU layer). I am able to train the model without errors. However, when I try to run: model(val_samples), or model(test_samples), the model does not return a vector of forecasted targets. Instead, model(train_samples) works just fine.

#My model is the following: 
model = Chain(
    Flux.flatten,
    LSTM(3,32),
    Dense(32,32,relu),
    Dense(32,1)
    )
#The data I have is: 
julia> size(train_samples)
(3, 1, 7642)
julia> size(val_samples)
(3, 1, 955)
julia> size(test_samples)
(3, 1, 955)
#And the labels are: 
julia> size(train_targets)
(1, 7642)
....

#I run the model with: 
ps = Flux.params(model)
opt = Flux.RMSProp()
loss(x,y) = Flux.Losses.mae(model(x),y)

epochs = 300
loss_history = []
for epoch in 1:epochs 
    Flux.train!(loss, ps, [(train_samples, train_targets)], opt)
    train_loss = loss(train_samples, train_targets)
    push!(loss_history, train_loss) 
    println("Epoch = $epoch Training Loss = $train_loss")
end 
#I correctly get results for the training data: 
julia> model(train_samples)
1×7642 Matrix{Float32}:
 0.0961234  0.0967543  0.0972749  0.0951103  0.0937003  0.091914  0.0913435  …  38.6692  43.5427  43.0876  43.2159  43.4824  43.612  43.5726  43.5831

The error I get, after having successfully trained the model, is the following:

julia> model(val_samples)
ERROR: DimensionMismatch: array could not be broadcast to match destination
Stacktrace:
  [1] check_broadcast_shape       
    @ .\broadcast.jl:553 [inlined]
  [2] check_broadcast_shape       
    @ .\broadcast.jl:554 [inlined]
  [3] check_broadcast_axes        
    @ .\broadcast.jl:556 [inlined]
  [4] instantiate
    @ .\broadcast.jl:297 [inlined]
  [5] materialize!
    @ .\broadcast.jl:884 [inlined]
  [6] materialize!
    @ .\broadcast.jl:881 [inlined]
  [7] muladd(A::Matrix{Float32}, B::Matrix{Float32}, z::Matrix{Float32})
    @ LinearAlgebra C:\Users\User\AppData\Local\Programs\Julia-1.9.3\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:249
  [8] (::Flux.LSTMCell{Matrix{Float32}, Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}})(::Tuple{Matrix{Float32}, Matrix{Float32}}, x::Matrix{Float64})
    @ Flux C:\Users\User\.julia\packages\Flux\ljuc2\src\layers\recurrent.jl:314     
  [9] Recur
    @ C:\Users\User\.julia\packages\Flux\ljuc2\src\layers\recurrent.jl:134 [inlined]
 [10] macro expansion
    @ C:\Users\User\.julia\packages\Flux\ljuc2\src\layers\basic.jl:53 [inlined]     
 [11] _applychain(layers::Tuple{typeof(Flux.flatten), Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}, x::Array{Float64, 3})
    @ Flux C:\Users\User\.julia\packages\Flux\ljuc2\src\layers\basic.jl:53
 [12] (::Chain{Tuple{typeof(Flux.flatten), Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})(x::Array{Float64, 3})
    @ Flux C:\Users\User\.julia\packages\Flux\ljuc2\src\layers\basic.jl:51
 [13] top-level scope
    @ REPL[134]:1

Applying the identical data structures to a model where I do not include a RNN cell, but only Dense Layers, does not lead to the problem, as I am able to get forecasts for the training samples, the validation and the testing sample without problems.

model = Chain(
    Flux.flatten,
    Dense(3,32,relu),
    Dense(32,32,relu),
    Dense(32,1)
    )

I don’t understand where the dimension mismatch originates from. I was thinking I might have wrongly defined the RNN chain so that the dimension of the train_sample matter for the future, but I don’t understand how this is possible.

Thank you in advance for any response to this.

Topic		Replies	Views
Problem with LSTM and GRU Layers in Flux New to Julia flux , machine-learning	9	561	February 14, 2024
Shape of input arrays for an LSTM in Flux.jl? julia 1.0 Machine Learning flux	1	1641	October 13, 2019
Issues with recurrent layers in Flux.jl Machine Learning flux	8	1045	September 13, 2021
Simple Flux LSTM for Time Series Machine Learning question , flux , time-series , machine-learning	62	13241	April 11, 2022
Understanding `Flux.Data.DataLoader` when training an LSTM model Machine Learning flux , lstm , rnn	1	647	February 2, 2024

Problem with GRU and LSTM Layers in Flux

Related topics