Taking Complex Autodiff Seriously in ChainRules

Okay, so after some very productive and stimulating conversation, this recent push has resulting in ChainRulesCore.jl/pull/167 (thanks @ettersi!) and ChainRules.jl/pull/196 which I must say feels like we’ve come up with a really great set of shared definitions and conventions.

The result of @ettersi’s now merged PR can be viewed in all of it’s LaTeX glory here:

Perhaps next up, we can lift some ideas from here: https://fluxml.ai/Zygote.jl/dev/complex/ such as the example showing how to get the Wirtinger derivatives, \partial \over \partial z and \partial \over \partial \bar{z} from the Jacobian, i.e.

using ChainRules
function wirtinger_rrule(f, z)
    _, pullback = rrule(f, z)
    du, dv = pullback(1)[2], pullback(im)[2]
    (du' + im*dv')/2, (du + im*dv)/2

wirtinger_rrule(abs2, 1 + im)

: (1.0 - 1.0im, 1.0 + 1.0im)
function wirtinger_frule(f, z)
    du_dx, dv_dx = reim(frule((Zero(), 1),f,z)[2])
    du_dy, dv_dy = reim(frule((Zero(),im),f,z)[2])

    (du_dx + dv_dy + im*(-du_dy + dv_dx))/2, (du_dx - dv_dy + im*(du_dy + dv_dx))/2

wirtinger_frule(sin, 1 + im)

: (0.8337300251311491 - 0.9888977057628651im, 0.0 + 0.0im)

Since you point out the Zygote docs, I have an open PR, where I tried to clarify the relationship with Wirtinger derivatives a bit more:

Should be pretty much finished, just got lost in my backlog a bit.


I am not sure what you mean here. AFAICT, the “only” remaining obstacle to introducing Wirtinger derivatives is to implement the chaining of chain rules as done e.g. in Zygote. Once that is done, Wirtinger derivatives can be provided using either of the functions you mentioned.


Oh I just meant adding that to our documentation. The functions I showed above work today. I’ll try to open a PR soon


I am not sure if this paper has been mentioned in this thread already, anyway it is very relevant to this discussion. The authors have also implemented their ideas in Julia code:

Guo, Chu, and Dario Poletti. “A scheme for automatic differentiation of complex loss functions.”


Highly relevant indeed! Thanks for pointing this out.

Their proposal is the same as what we came up with here, just approached from a different angle. That gives further validation that this convention is most likely the right thing to do.