Yes, after thinking some more, passing something back in the gradient won’t always be enough. That’s what 521 looked at, and what I read this as saying, alter the gradient representation:
Now you seem to be saying that the object with this extra flag is present on the forward pass. In which case you can simulate the effect by having the pullback write NaN into the original a. (Or just restore the original values before mutation.)
What this still won’t solve is that, when the return value of fill!(xs, y) is discarded, its pullback won’t get the gradient for the new xs. In the spirit of trying to make simple cases work you could, modulo (1), (2) above, have the pullback return NotImplemented for both dx and dy. That could perhaps let something like x[1]=0 work, when you don’t want the gradient of x. But won’t help for sum(Float32[x for _ in 1:3]).