Flux: can't get recurrent sequence-to-one model to train

mcreel · March 18, 2021, 10:57am

I am attempting to create a simple recurrent model that takes data generated by a parameterized data generating process as the input, and the parameters of the DGP as the output. The goal is to learn the parameters, given data from the model. The following code is trying to learn the parameter of a moving average order 1 model, given samples of size 100 from the model, with the prior of the MA(1) DGP being a U(0,1) distribution.

Because the target output is a scalar, only the final element of the RNN cell is used to evaluate loss.

The code runs, but the model does not learn: the loss remains constant during training. I suppose I’ve made some simple mistake, but I can’t find it. Any help would be appreciated!

When run, I get output like

julia> include("test2.jl")
[ Info: Epoch 1
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706
[ Info: Epoch 2
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706

julia>

So, the model does not seem to learn during training. The code is:

# this tries to learn the MA(1) parameter given a sample of size n,
# using a recurrent NN.

using Flux
using Base.Iterators

# define the model
L1 = LSTM(1, 10)    # number of vars in sample by number of learned stats
L2 = Dense(10, 5, tanh)
L3 = Dense(5, 1)
function m(x)
#  Flux.reset!(L1)
  L3(L2((L1.(x))[end]))[]
end

# Data generating process: returns sample of size n from MA(1) model, and the parameter that generated it
function dgp(reps)
    n = 100  # for future: make this random?
    ys = zeros(Float32, reps, n)
    θs = zeros(Float32, reps)
    for i = 1:reps
        ϕ = rand(Float32)
        e = randn(Float32, n+1)
        ys[i,:] = e[2:end] .+ ϕ*e[1:end-1] # MA1
        θs[i] = ϕ
    end
    return ys, θs
end    

# make the data for the net: x is input, θ is output 
nsamples = 1000
x, θ = dgp(nsamples)  # these are a nsamples X 100 matrix, and an nsamples vector
# chunk the data into batches
batches = [(x[ind,:], θ[ind])  for ind in partition(1:size(x,1), 50)]

# train
loss(x,y) = Flux.huber_loss(m(x), y; δ=0.1) # Define the loss function
opt = ADAM(0.001)
evalcb() = @show(loss(x, θ))
Flux.@epochs 2 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))

mcreel · March 22, 2021, 9:45am

FWIW, I have managed to get the model to train, using the code below. Using Chain and a made-to-order loss function solved the problem. I suppose that my original formulation was not having the parameters tracked (?). This model does not get a very good fit, though perhaps longer training could improve that. Models which pass the raw data through statistics, and then pass the statistics to the net as the inputs fit much better, I have found. So, this approach may possibly work with more training, but, so far, in my work, the other approach is better, in terms of goodness of fit per fixed amount of training.

# this tries to learn the MA(1) parameter given a sample of size n,
# using a recurrent NN.

using Flux
using Base.Iterators

# define the model
L1 = LSTM(1, 10)    # number of vars in sample by number of learned stats
L2 = Dense(10, 10, relu)
L3 = Dense(10, 1)
m = Chain(L1, L2, L3)

# Data generating process: returns sample of size n from MA(1) model, and the parameter that generated it
function dgp(reps)
    n = 100  # for future: make this random?
    xs = zeros(Float32, reps, n)
    θs = zeros(Float32, reps)
    for i = 1:reps
        ϕ = rand(Float32)
        e = randn(Float32, n+1)
        xs[i,:] = e[2:end] .+ ϕ*e[1:end-1] # MA1
        θs[i] = ϕ
    end
    return xs, θs
end    

# make the data for the net: x is input, θ is output 
nsamples = 10000
x, θ = dgp(nsamples)  # these are a nsamples X 100 matrix, and an nsamples vector
# chunk the data into batches
batches = [(x[ind,:], θ[ind])  for ind in partition(1:size(x,1), 50)]

# train
function loss(x,y)
    Flux.reset!(m)
    sqrt.(sum((m.(x)[:,end][1] .-y).^2)/nsamples)
end    
opt = ADAM(0.001)
evalcb() = @show(loss(x, θ))
Flux.@epochs 3 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
opt = ADAM(0.0001)
Flux.@epochs 100 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
@show [m.(x)[:,end] θ]

ToucheSir · March 25, 2021, 4:37am

m was a function in your original example, so params(m) was returning an empty collection because arbitrary functions don’t have parameters associated with them . It’s great that you managed to get things running though.

One thing you could try is using map(m, x)[:,end] instead of broadcasting. There are a couple bugs around broadcasting RNNs that should be fixed in the next Flux release, and it may be that you’re affected by one of them.

mcreel · March 25, 2021, 8:11am

Thanks. The change you suggest seems to work the same as my working version. The model learns to map any input to the prior mean pretty quickly, but then does not improve, at least for the amounts of training I’ve tried. Perhaps a better architecture would solve that. At the moment, this seems not to be a very useful approach - supplying informative statistics to the net based upon knowledge of the dgp works much better.

luboshanus · August 12, 2021, 9:03am

Hi, does your example still work? I have tried it with Flux 0.12.6 and it puts error, it is because of LSTM.
I understand that LSTM requires vectors of matrices. Also, still searching a good algorithm working with time series prediction. I find it uneasy with Flux. Thanks if you have any example working.

Here is the error of your code marked as solution.

julia> Flux.@epochs 3 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
[ Info: Epoch 1
ERROR: MethodError: no method matching (::Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}})(::Tuple{Matrix{Float32}, Matrix{Float32}}, ::Float32)
Closest candidates are:
  (::Flux.LSTMCell{A, V, var"#s318"} where var"#s318"<:Tuple{AbstractMatrix{T}, AbstractMatrix{T}})(::Any, ::Union{AbstractVector{T}, AbstractMatrix{T}, Flux.OneHotArray}) where {A, V, T} at /Users/lubos/.julia/packages/Flux/Zz9RI/src/layers/recurrent.jl:137
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0 [inlined]
....

mcreel · August 12, 2021, 11:42am

Sorry, I haven’t been working with this for a while. I think that the input format required by recurrent cells may have changed in the recent versions of Flux.

ToucheSir · August 12, 2021, 5:56pm

When debugging what appear to be input format related issues, the first step is unravelling the train! loop and ensuring the input is what you’d expect it to be. Once you’ve verified that’s correct, the next step is to make sure the data you pass to train! is being sent to the loss in the format you’d expect. Feel free to post about both findings if you get to them.

zlq178 · July 27, 2022, 12:57pm

Did you correct the input format? Can you show the working code?

mcreel · July 27, 2022, 1:18pm

Please see this blog post, which has a working example: A Simple Recurrent Model in Flux | Jonathan Chassot In the same blog, there is a post for sequence-to-one RNNs, too.

Topic		Replies	Views
Errors with Flux RNN set Machine Learning question , flux , machine-learning	1	463	April 9, 2022
Simple Flux LSTM for Time Series Machine Learning question , flux , time-series , machine-learning	62	13563	April 11, 2022
Issues with recurrent layers in Flux.jl Machine Learning flux	8	1062	September 13, 2021
Sequence-to-one modelling and Flux.reset! New to Julia flux	2	636	November 2, 2021
Problem with LSTM and GRU Layers in Flux New to Julia flux , machine-learning	9	686	February 14, 2024

Flux: can't get recurrent sequence-to-one model to train

Related topics