-
Δ -> (Δ, Δ)
is essentially defining \frac{\partial L}{\partial \text{add(a, b)}} \rightarrow \left( \frac{\partial L}{\partial \text{a}}, \frac{\partial L}{\partial \text{b}} \right). If we want to calculate the gradient ofadd(a, b)
wrt toa
andb
, thenL
would beadd(a, b)
andΔ
is therefore 1. Generally speaking,gradient
is just syntatic sugar that will lower down to the adjoint and plug 1 in. - Defining the adjoint is the reason how Zygote knows the derivative! Substituting Δ with 1 gives you the derivatives (1, 1).
This example is rather trivial, but customizing the adjoint offers incredible flexibility. You can make Zygote work with your own data type, or use mathematical knowledge to aid automatic differentiation.