I’m implementing a model that requires using the gradients of a feedforward network with respect to its inputs as part of the loss function. I then need to train the network by differentiating the loss with respect to the parameters of the network. I can’t seem to get this working due to `ERROR: Mutating arrays is not supported`

.

Here is a minimal example. In my actual model I need to do something more complicated than simply summing the gradients, but this captures the error.

```
net = Dense(10, 1)
x = randn(10, 128) # dims, batch
function pred(x, net)
y, pullback = Zygote.pullback(net, x)
grads = pullback(fill!(similar(y), 1))[1]
return grads
end
gradient(() -> sum(pred(x, net)), params(net))
```

I’m quite comfortable with python/pytorch, but I’m feeling totally lost with Julia/Flux. What is the right way to do this? This is superficially similar to the gradient penalty in WGAN-GP, but I can’t seem to find a flux implementation.