FWIW, I have managed to get the model to train, using the code below. Using Chain and a made-to-order loss function solved the problem. I suppose that my original formulation was not having the parameters tracked (?). This model does not get a very good fit, though perhaps longer training could improve that. Models which pass the raw data through statistics, and then pass the statistics to the net as the inputs fit much better, I have found. So, this approach may possibly work with more training, but, so far, in my work, the other approach is better, in terms of goodness of fit per fixed amount of training.
# this tries to learn the MA(1) parameter given a sample of size n,
# using a recurrent NN.
using Flux
using Base.Iterators
# define the model
L1 = LSTM(1, 10) # number of vars in sample by number of learned stats
L2 = Dense(10, 10, relu)
L3 = Dense(10, 1)
m = Chain(L1, L2, L3)
# Data generating process: returns sample of size n from MA(1) model, and the parameter that generated it
function dgp(reps)
n = 100 # for future: make this random?
xs = zeros(Float32, reps, n)
θs = zeros(Float32, reps)
for i = 1:reps
ϕ = rand(Float32)
e = randn(Float32, n+1)
xs[i,:] = e[2:end] .+ ϕ*e[1:end-1] # MA1
θs[i] = ϕ
end
return xs, θs
end
# make the data for the net: x is input, θ is output
nsamples = 10000
x, θ = dgp(nsamples) # these are a nsamples X 100 matrix, and an nsamples vector
# chunk the data into batches
batches = [(x[ind,:], θ[ind]) for ind in partition(1:size(x,1), 50)]
# train
function loss(x,y)
Flux.reset!(m)
sqrt.(sum((m.(x)[:,end][1] .-y).^2)/nsamples)
end
opt = ADAM(0.001)
evalcb() = @show(loss(x, θ))
Flux.@epochs 3 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
opt = ADAM(0.0001)
Flux.@epochs 100 Flux.train!(loss, Flux.params(m), batches, opt, cb = Flux.throttle(evalcb, 1))
@show [m.(x)[:,end] θ]