I’m implementing a model that requires using the gradients of a feedforward network with respect to its inputs as part of the loss function. I then need to train the network by differentiating the loss with respect to the parameters of the network. I can’t seem to get this working due to
ERROR: Mutating arrays is not supported.
Here is a minimal example. In my actual model I need to do something more complicated than simply summing the gradients, but this captures the error.
net = Dense(10, 1) x = randn(10, 128) # dims, batch function pred(x, net) y, pullback = Zygote.pullback(net, x) grads = pullback(fill!(similar(y), 1)) return grads end gradient(() -> sum(pred(x, net)), params(net))
I’m quite comfortable with python/pytorch, but I’m feeling totally lost with Julia/Flux. What is the right way to do this? This is superficially similar to the gradient penalty in WGAN-GP, but I can’t seem to find a flux implementation.