Ok, I think I got the connection between Zygote.gradient and the Wirtinger derivative now. It seems Iike I can in fact use a single call to Zygote.gradient, because my function J is \in \mathbb{R} (whereas the above wirtinger function also handles the more general case of J \in \mathbb{C}). Specifically:
-
Zygote.gradient(func, Ψ)is equivalent toy, back = Zygote.pullback(func, Ψ); back(1.0)[1], according to the Zygote documentation. -
Zygote.pullbackfalls back toChainRules.rrule(or, at least, is compatible withChainRules.rrule). -
Given a function J: \mathbb{C}^N \rightarrow \mathbb{R}, and
z⃗, back = Zygote.pullback(J, z⃗₀), the pullback for the imaginary unity is zero, that is,dv = back(1im)[1] = 0in thewirtingerfunction:- According to the ChainRules documentation, for a function \mathbb{C} \rightarrow \mathbb{C} defined as f(x+iy) = u(x,y) + iv(x, y), the
rrulereturns \Delta u \, \tfrac{\partial u}{\partial x} + \Delta v \, \tfrac{\partial v}{\partial x} + i \, \Bigl(\Delta u \, \tfrac{\partial u }{\partial y} + \Delta v \, \tfrac{\partial v}{\partial y} \Bigr), where (\Delta u, \Delta v) is the adjoint that we feed intorrule. In our case, \Delta u = 0, \Delta v = 1 (because we’re feeding in1im), and u=J, v=0 (because J \in \mathbb{R}). Thus the entire pullback is zero. - The pullback for a vector function is the sum of the scalar pullbacks for the individual components, so the argument that
back(1im)[1]=0still holds.
- According to the ChainRules documentation, for a function \mathbb{C} \rightarrow \mathbb{C} defined as f(x+iy) = u(x,y) + iv(x, y), the
Thus, in the wirtinger function, du = Zygote.gradient(func, Ψ) and dv = 0, and the two outputs are simply du'/2 and du/2. That is, the gradient up to a factor of two, as I was observing.