Hello.

I am trying to train an original model using Flux.jl for sequential data. I want to use a loss function that utilizes a for loop to recursively use the output of the neural network as input for each data point. However, when I write the code as follows, the gradient calculation takes an long time. It seems strange that the code recompiles every time even when running it multiple times. I’m not sure how to fix it.

```
using Flux
function loss(model1, model2, xs, y0, ŷs)
l = zero(eltype(x[begin]))
y = y0
for (i,x) in enumerate(xs)
h = model1(vcat(x, y))
y = model2(h)
l += Flux.mse(ŷs[i], sum(y))
end
return l
end
xs = [randn(Float32, (16)) for _ in 1:8]
y0 = randn(Float32, (16))
ŷs = [[1f0] for _ in 1:8]
m1 = Dense(32=>16)
m2 = Dense(16=>16)
```

```
julia> @time gradient(m1,m2) do m1,m2
loss(m1, m2, xs, y0, ŷs)
end
3.957895 seconds (27.08 M allocations: 1.390 GiB, 8.21% gc time, 99.87% compilation time: 7% of which was recompilation)
julia> @time gradient(m1,m2) do m1,m2
loss(m1, m2, xs, y0, ŷs)
end
0.059321 seconds (379.88 k allocations: 20.261 MiB, 98.93% compilation time)
```

I would appreciate it if you could provide guidance on how to address this problem.

Thank you.