Problems with implementing a basic DeepAR algorithm in Julia

I am trying to implement a basic DeepAR that serves as a baseline to compare with other time series forecasting algorithms. However, when testing the following code, it seems that the code does not learn well an example AR(3) autoregressive model that I have put in. Therefore, I have doubts about whether what I have done is correct, if there is any error. Could someone help me? I am not very familiar with the architecture of DeepAR.

losses = []
optim = Flux.setup(Flux.Adam(1e-2), model)
@showprogress for (batch_Xₜ, batch_Xₜ₊₁) in zip(loaderXtrain, loaderYtrain)
    loss, grads = Flux.withgradient(model) do m
        likelihood = 0
        Flux.reset!(m)
        model([batch_Xₜ[1]])
        for (x, y) in zip(batch_Xₜ[2:end], batch_Xₜ₊₁[2:end])
            μ, logσ = model([x])
            σ = softplus(logσ)
            ŷ = rand(Normal(μ, σ))
            likelihood = log(sqrt(2 * π)) + log(σ) + ((y - ŷ)^2 /(2 * σ^2)) + likelihood
        end
        -likelihood/length(batch_Xₜ)
    end
    Flux.update!(optim, model, grads[1])
    push!(losses, loss)
end

Can you provide a complete reproducible example? What makes you say it does not learn well?

Thank you very much for your message! Actually, the rest of the code is quite simple. I am attaching it.

ar_hparams = ARParams(;
    ϕ=[0.5f0, 0.3f0, 0.2f0],
    x₁=rand(Normal(0.0f0, 1.0f0)),
    proclen=20000,
    noise=Normal(0.0f0, 0.2f0),
)

n_series = 100

loaderXtrain, loaderYtrain, loaderXtest, loaderYtest = generate_batch_train_test_data(
    n_series, ar_hparams
)

model = Chain(
    RNN(1 => 10, relu), RNN(10 => 10, relu), Dense(10 => 16, relu), Dense(16 => 2, identity)
)

ARParams are the parameters for the autoregressive process and generate_batch_train_test_data is a function that generates n_series realizations of the autoregressive process. loaderXtrain are the autoregressive series and loaderYtrain are the same series shifted to the right by one step. That is, if loaderXtrain[1] (it would actually be collect(loaderXtrain)[1]) represents an autoregressive process X_t, loaderYtrain[1] would be the process X_{t+1}.

Well, I think it doesn’t seem to be learning since the loss after training is the following (I’ve tried with different learning rates and parameters and the results are the same). Obviously, looking at how it approximates the series used for training, it doesn’t manage to get good results; in fact, it seems that the solution after training is practically identical to the solution before training.

I’m sorry if the example is not minimal. If you need more information, I will try to upload a minimal, reproducible example tomorrow. The thing is that the code for generating the AR(p) process has turned out to be longer than it should. I just wanted to know if I’ve made some big mistake that I’m not seeing, or if DEEPAR is supposed to perform this poorly.