How to add norm of gradient to a loss function?

Ok, this is tricky to accomplish with Zygote alone, but you could try mixing ADs like in Zygote push forward unable to differentiate through generic broadcast - #10 by RynoLaubscher. Unfortunately I don’t have any personal experience with that, so if you’re unable to get that working and nobody else replies, I’d recommend hitting up #autodiff on Slack.

1 Like