Taking the derivative of a scalar loss, that involves a gradient inside, errors on GPU only

I opened an issue on Zygote.jl: Taking nested gradient for implementing Wasserstein GAN with gradient penalty (WGAN-GP) on GPU · Issue #1262 · FluxML/Zygote.jl (github.com)

1 Like