Zygote (which is the automatic differentiation engine underlying Flux) works differently from pytorch’s autograd: it doesn’t keep a tape. If you want to compute gradients, you call Flux.gradient, if you don’t want to compute gradient you just call the function.
The optimizer has nothing to do with it, the optimizer just handle how the gradient should be used to update the model.
2 Likes