```
# -*- coding: utf-8 -*-
"""
Computation of *moments* of gradients through tensorflow operations.
Tensorflow is typically used for empircal risk minimzation with gradient-based
optimization methods. That is, we want to adjust trainable variables ``W``,
such as to minimize an objective quantity, called ``LOSS``, of the form
LOSS(W) = (1/n) * sum{i=1:n}[ loss(W, d_i) ]
That is the mean of individual losses induced by ``n`` training data points
``d_i``. Consquently, the gradient of ``LOSS`` w.r.t. the variables ``W`` is
the mean of individual gradients ``dloss(W, d_i)``. These individual gradients
are not computed separately when we call ``tf.gradients`` on the aggregate
``LOSS``. Instead, they are implicitly aggregated by the operations in the
backward graph. This batch processing is crucial for the computational
efficiency of the gradient computation.
This module provides functionality to compute the ``p``-th moment of the
individual gradients, i.e. the quantity
```

This file has been truncated. show original