Simple Flux LSTM for Time Series

If I have understood it correctly, the loss function should look something like this:

loss(inputs, output) = sum(abs2.(output.-vcat(model.(inputs)...)[:, end]))

if you just would like to compare the model output and the end of the batches. Will the inner state be resetted after each batch? Like in stateful = false. Otherwise, would it not make more sense to pass the data as one big batch? Because after a couple of time steps the state would be close to the next time step target value and to repeat the model calculation wouldn´t be necessary.

If the states would be resetted, is the above loss function efficient or is there a better way to calculate the mse?

can you please demonstrate how training can be done after this step using a batch size of 30 for example? I tried the below but it didn’t work for the 3D data:

train_loader = DataLoader((trainX, trainY), batchsize=30 shuffle=true)

It’s impossible to answer this without seeing what trainX and trainY are. That is, the full type, dimensionality, etc.


I posted a question in the above link with the details.

Hi, sorry for bumping the topic.
I have tried to run code you’ve written which is below:

using Flux

m = Chain(LSTM(3,2), Dense(2,1))

inputs = rand(3,4)

for t in 1:4
    output = m(inputs[:,t])
    @show output

I don’t know why it doesn’t work? It’s a basic code and there is no reason to not being able to run it. I get the error below when I run the code:

ERROR: LoadError: MethodError: no method matching (::Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}})(::Tuple{Matrix{Float32}, Matrix{Float32}}, ::Vector{Float64})
Closest candidates are:
  (::Flux.LSTMCell{A, V, <:Tuple{AbstractMatrix{T}, AbstractMatrix{T}}})(::Any, ::Union{AbstractVector{T}, AbstractMatrix{T}, Flux.OneHotArray}) where {A, V, T} at ~/.julia/packages/Flux/BPPNj/src/layers/recurrent.jl:157
 [1] (::Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}})(x::Vector{Float64})
   @ Flux ~/.julia/packages/Flux/BPPNj/src/layers/recurrent.jl:47
 [2] applychain(fs::Tuple{Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}, x::Vector{Float64})
   @ Flux ~/.julia/packages/Flux/BPPNj/src/layers/basic.jl:47
 [3] (::Chain{Tuple{Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})(x::Vector{Float64})
   @ Flux ~/.julia/packages/Flux/BPPNj/src/layers/basic.jl:49
 [4] top-level scope
   @ ~/Desktop/b/new.jl:8
in expression starting at /home/user/Desktop/b/new.jl:7

What can be the reason? Thanks in advance.

just do inputs = rand(Float32, 3,4), things nowadays need to be Float32 from the start.

1 Like

It works, thank you so much!

I tried the above and it does indeed work, but when I follow by


I also get results (a 1 by 4 row vector) where the first element matches the result from the loop but the other numbers don’t. Please pardon my ignorance, but could you please explain why this is?

Another (not entirely unrelated?) question: does Flux reset during training with each new pattern?

Thanks in advance

Maybe this explains it? I would guess the internal state is updated each time the model is run, not matter if it is on a single datapoint or a batch. So in the first for loop the first data has the reset state of the LSTM, but later encounters a state that is based on previous data. In the batched case or the loop with the reset, all datapoints are calculated based on the reset state of the LSTM.

julia> for t in 1:4
           output = m(inputs[:,t])
           @show output
output = Float32[0.052006032]
output = Float32[0.12330223]
output = Float32[0.21572198]
output = Float32[0.20931965]

julia> Flux.reset!(m)

julia> m(inputs)
1×4 Matrix{Float32}:
 0.052006  0.0927443  0.137698  0.0787673

julia> for t in 1:4
           output = m(inputs[:,t])
           @show output
output = Float32[0.052006032]
output = Float32[0.09274433]
output = Float32[0.13769753]
output = Float32[0.07876728]
1 Like

Thanks so much for that. I see. But that leads to my question about training… during training surely the forecast for a given sequence should not depend on which sequence (or batch) came before it(?) I had understood that the sequential structure was assumed only within a given vector of input matrices - that there was assumed no spatial or time-relation between these vectors. Have I got it wrong?
Thanks again

I’ve not really used this myself, but a quick glance at the docs seem to suggest you have to do that yourself.

In many situations, such as when dealing with a language model, the sentences in each batch are independent (i.e. the last item of the first sentence of the first batch is independent from the first item of the first sentence of the second batch), so we cannot handle the model as if each batch was the direct continuation of the previous one. To handle such situations, we need to reset the state of the model between each batch, which can be conveniently performed within the loss function:

function loss(x, y)
 sum(mse(m(xi), yi) for (xi, yi) in zip(x, y))
1 Like

Thank you so much, that is exactly what I was looking for. I didn’t see that last bit of the docs which you quoted at the end there!

I understand a bit more, but I’m still looking for a simple notebook (zoo?) example where someone has applied this to a basic time series model.

Thanks again for all your help.

1 Like

Currently data is assumed to be in the following shapes for the recurrent layers:

X = [rand(Float32, in) for _ in 1:T] # not batched vector over time steps
Flux.reset!(m) # reset the hidden state to m.cell.state0
res = [m(x) for x in X] # each element is out \by 1

X = [rand(Float32, in, batch) for _in 1:T] #  batched vector over time steps
Flux.reset!(m) # reset the hidden state to m.cell.state0
res = [m(x) for x in X] # used same as above. Each element is out \by batch

X = rand(Float32, in, batch, T)
res = m(X) # should produce a matrix of out \by batch \by T

So in your example, the input inputs = rand(Float32,3,4) is assumed to be a single batch.

I think the flux model zoo should have what you are looking for. I also have a jupyter notebook as an example, but it is getting pretty out of date. It should still work but the map(rnn, x)[end] should be replaced with [rnn(_x) for _x in x] due to guaranteed ordering.

Thanks for your suggestions. I am trying to apply LSTM to a time series (scalar) and I don’t see anything even remotely relevant to that in the zoo, and your MNIST notebook is completely different [I don’t know why recurrence would be useful in this case, actually, but that is just my ignorance, I’m sure]

I just want input sequences of 30 (days) each to forecast the next day, and want to assess my model on the performance on the quality of the forecast on the 30th day(only). This should be a prototypical time series forecasting problem, but I see nothing like this in any examples anywhere.
Anyway, thanks again!

1 Like

A working example of time series regression has never been in the model zoo, I believe. I finally got hold of an nvidia gpu, so I will work on making a simple example when I finish teaching this term, if one doesn’t appear before.

1 Like

There are at least a few discussions and examples/MWEs around this kind of univariate time series forecasting with Flux RNNs floating around community forums. How to train Flux to learn a sequence conditional to some initial "seeds"? is a recent example I just remembered. The reason such a thing does not exist in the model zoo is probably two-fold:

  1. Model zoo entries don’t write themselves :stuck_out_tongue:
  2. A LSTM is a big hammer to model a 30-sample univariate timeseries forecasting problem with. Generally we try to strike a balance between clear, brief files and sufficiently “common” or “interesting” datasets and tasks in the model zoo to differentiate it from tutorials. In this case, perhaps something like forecasting with a UCI benchmark dataset would be appropriate.

Here’s a simple LSTM model that forecasts AR1 or MA1 data pretty well. I’m going to put this in a github archive for further work, but here’s an initial version. BTW, this needs a train/test split. At the moment, it’s probably over-fitting.

using Flux, Plots, Statistics
using Base.Iterators

# DGPs: AR1 is more forecastable than MA1
function MA1(n, σ)
    e = randn(n+1) .* σ 
    y = e[2:n+1] + 0.9*e[1:n]

function AR1(n, σ)
    y = zeros(n)
    for t = 2:n
        y[t] = 0.9*y[t-1] + σ*randn()

# generate the data
n = 10000 # sample size
σ = 1.0  # true std. dev. of the shocks
data = Float32.(AR1(n, σ))

# set up the training
batchsize = 2  # remember, MA model is only predictable one step out
epochs = 100   # number of training loops through data

# the model: this is just an initial guess
# need to experiment with this
m = Chain(LSTM(batchsize, 10), Dense(10,2, tanh), Dense(2,batchsize))

function loss(x,y)

# the first element of the batched data is one lag
# of the second element, in chunks of batchsize. So,
# we are doing one-step-ahead forecasting, conditioning
# on batchsize lags
batches = [(data[ind .- 1], data[ind])  for ind in partition(2:size(data,1), batchsize)]
batches = batches[1:end-1] # drop the last, which may not have full size
Flux.@epochs epochs Flux.train!(loss,Flux.params(m), batches, ADAM())

function predict(data, batchsize)
    n = size(data,1)
    yhat = zeros(n)
    for t = batchsize+1:n
        x = data[t-batchsize:t-1]
        yhat[t] = m(x)[end]

pred = predict(data,batchsize)
error = data - pred
plot(1:n, [data error])
println("true std. error of noise: ", σ)
println("std. error of forecast: ", std(error))
println("std. error of data: ", std(data))

Thank you so much! That’s exactly the kind of example code I was hoping to see and play around with!

@mcreel that would be a welcome contribution to the model-zoo