Zygote: meaning of @adjoint add(a, b) = add(a, b), Δ -> (Δ, Δ)

  1. Δ -> (Δ, Δ) is essentially defining \frac{\partial L}{\partial \text{add(a, b)}} \rightarrow \left( \frac{\partial L}{\partial \text{a}}, \frac{\partial L}{\partial \text{b}} \right). If we want to calculate the gradient of add(a, b) wrt to a and b, then L would be add(a, b) and Δ is therefore 1. Generally speaking, gradient is just syntatic sugar that will lower down to the adjoint and plug 1 in.
  2. Defining the adjoint is the reason how Zygote knows the derivative! Substituting Δ with 1 gives you the derivatives (1, 1).

This example is rather trivial, but customizing the adjoint offers incredible flexibility. You can make Zygote work with your own data type, or use mathematical knowledge to aid automatic differentiation.