Is it possible to perform some computation in my loss function that is excluded from the gradient, while still using the simple
Tensorflow and Keras have something called
stop_gradient , which can be applied to an output to signal that it should be treated as a constant (https://www.tensorflow.org/api_docs/python/tf/stop_gradient). This is very handy when programming things like contrastive divergence or expectation maximization, where there is a part of the computation graph that should not be taken into account in the loss gradients.
Otherwise I can do the training loop myself, but having a
stop_gradient in Keras was so handy that I think Flux could have something simlar?