Different loss with Zygote when taking gradients vs. without

I am working on the newest Flux branch and noticed some weird behavior.

When I calculate the loss of my model’s outputs I get the following number:

> y_pred = model(x)
> loss(y_pred, y)

However, doing the same calculation while taking gradients gives a different result:

> ps = Params(params(model))
  gradient(ps) do
      y_pred = model(x)
      l = loss(y_pred, y)
      return l


Maybe there is something obvious I’m missing, otherwise I’ll probably have to put together a minimum working example.
Thanks for any help!

1 Like