I’ve not really used this myself, but a quick glance at the docs seem to suggest you have to do that yourself.
In many situations, such as when dealing with a language model, the sentences in each batch are independent (i.e. the last item of the first sentence of the first batch is independent from the first item of the first sentence of the second batch), so we cannot handle the model as if each batch was the direct continuation of the previous one. To handle such situations, we need to reset the state of the model between each batch, which can be conveniently performed within the loss function:
function loss(x, y) Flux.reset!(m) sum(mse(m(xi), yi) for (xi, yi) in zip(x, y)) end