Flux: can't get recurrent sequence-to-one model to train

I am attempting to create a simple recurrent model that takes data generated by a parameterized data generating process as the input, and the parameters of the DGP as the output. The goal is to learn the parameters, given data from the model. The following code is trying to learn the parameter of a moving average order 1 model, given samples of size 100 from the model, with the prior of the MA(1) DGP being a U(0,1) distribution.

Because the target output is a scalar, only the final element of the RNN cell is used to evaluate loss.

The code runs, but the model does not learn: the loss remains constant during training. I suppose I’ve made some simple mistake, but I can’t find it. Any help would be appreciated!

When run, I get output like

julia> include("test2.jl")
[ Info: Epoch 1
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706
[ Info: Epoch 2
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706
loss(x, θ) = 0.0734574248343706

julia> 

So, the model does not seem to learn during training. The code is:

# this tries to learn the MA(1) parameter given a sample of size n,
# using a recurrent NN.

using Flux
using Base.Iterators

# define the model
L1 = LSTM(1, 10)    # number of vars in sample by number of learned stats
L2 = Dense(10, 5, tanh)
L3 = Dense(5, 1)
function m(x)
#  Flux.reset!(L1)
  L3(L2((L1.(x))[end]))[]
end

# Data generating process: returns sample of size n from MA(1) model, and the parameter that generated it
function dgp(reps)
    n = 100  # for future: make this random?
    ys = zeros(Float32, reps, n)
    θs = zeros(Float32, reps)
    for i = 1:reps
        ϕ = rand(Float32)
        e = randn(Float32, n+1)
        ys[i,:] = e[2:end] .+ ϕ*e[1:end-1] # MA1
        θs[i] = ϕ
    end
    return ys, θs
end    

# make the data for the net: x is input, θ is output 
nsamples = 1000
x, θ = dgp(nsamples)  # these are a nsamples X 100 matrix, and an nsamples vector
# chunk the data into batches
batches = [(x[ind,:], θ[ind])  for ind in partition(1:size(x,1), 50)]

# train
loss(x,y) = Flux.huber_loss(m(x), y; δ=0.1) # Define the loss function
opt = ADAM(0.001)
evalcb() = @show(loss(x, θ))
Flux.@epochs 2 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))

FWIW, I have managed to get the model to train, using the code below. Using Chain and a made-to-order loss function solved the problem. I suppose that my original formulation was not having the parameters tracked (?). This model does not get a very good fit, though perhaps longer training could improve that. Models which pass the raw data through statistics, and then pass the statistics to the net as the inputs fit much better, I have found. So, this approach may possibly work with more training, but, so far, in my work, the other approach is better, in terms of goodness of fit per fixed amount of training.

# this tries to learn the MA(1) parameter given a sample of size n,
# using a recurrent NN.

using Flux
using Base.Iterators

# define the model
L1 = LSTM(1, 10)    # number of vars in sample by number of learned stats
L2 = Dense(10, 10, relu)
L3 = Dense(10, 1)
m = Chain(L1, L2, L3)

# Data generating process: returns sample of size n from MA(1) model, and the parameter that generated it
function dgp(reps)
    n = 100  # for future: make this random?
    xs = zeros(Float32, reps, n)
    θs = zeros(Float32, reps)
    for i = 1:reps
        ϕ = rand(Float32)
        e = randn(Float32, n+1)
        xs[i,:] = e[2:end] .+ ϕ*e[1:end-1] # MA1
        θs[i] = ϕ
    end
    return xs, θs
end    

# make the data for the net: x is input, θ is output 
nsamples = 10000
x, θ = dgp(nsamples)  # these are a nsamples X 100 matrix, and an nsamples vector
# chunk the data into batches
batches = [(x[ind,:], θ[ind])  for ind in partition(1:size(x,1), 50)]

# train
function loss(x,y)
    Flux.reset!(m)
    sqrt.(sum((m.(x)[:,end][1] .-y).^2)/nsamples)
end    
opt = ADAM(0.001)
evalcb() = @show(loss(x, θ))
Flux.@epochs 3 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
opt = ADAM(0.0001)
Flux.@epochs 100 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
@show [m.(x)[:,end] θ]

1 Like

m was a function in your original example, so params(m) was returning an empty collection because arbitrary functions don’t have parameters associated with them :slight_smile: . It’s great that you managed to get things running though.

One thing you could try is using map(m, x)[:,end] instead of broadcasting. There are a couple bugs around broadcasting RNNs that should be fixed in the next Flux release, and it may be that you’re affected by one of them.

Thanks. The change you suggest seems to work the same as my working version. The model learns to map any input to the prior mean pretty quickly, but then does not improve, at least for the amounts of training I’ve tried. Perhaps a better architecture would solve that. At the moment, this seems not to be a very useful approach - supplying informative statistics to the net based upon knowledge of the dgp works much better.

1 Like