I want to differentiate generic Julia code that predates or is otherwise unconcerned with compatibility with Zygote. For example, I can’t expect StatsBase’s mean
function to use Zygote’s buffers as a way to fix gradient() fails on array mutation for `mean(f, x; dims)` · Issue #1128 · FluxML/Zygote.jl · GitHub.
There is a small but pivotal difference between that proposal and my own. Rather than poison the gradient, I poison the data and set the gradient to zero (ideally a structural zero). A key objection to ChainRules#521 is “that it will cause any other rule which has captured x
to give wrong answers” (@mcabbott here). By poisoning the data, any other rule which captures the mutated value prior to mutation will throw.