Automatic differentiation of complex valued functions

Thanks for the info and clarification; it makes a lot of sense that forward diff of this case is relatively easy. I know this is not simple in general and it’s awesome to see the progress being made. To be honest I’m just following this stuff from afar and cheering from the sidelines.

A while back I encountered the Wirtinger derivatives in classical field theory in combination with functional derivatives (appendix A here). I was super confused at the time because there seems to be a lot of folklore/vague thinking in physics about how to correctly work with these things. I’m not even sure there was a solid consensus that they should be called Wirtinger derivatives. Since then some people have worked on the Wikipedia page so perhaps that’s finally fixed that problem! Anyway, these days I like to point people at the terminology because these derivatives are both elegant and useful.

3 Likes

@c42f
I’ve also adventured down the long lonesome path of using Wirtinger derivatives. Couldn’t agree more - the literature on them is confusing, sometimes contradictory, and mostly cryptic. Not that the concept is “difficult” but troubleshooting them was a huge nightmare. Kind of got the the feeling that there was some potential fundamental problem with them and they were a kind of dirty secret or something.

Back before these were supported(are they supported still unsure?) I manually(with generic code) was making complex neural network topologies with these puppies. Not a lot of fun, some cool results though.

I’d love to see more efforts to tame Wirtinger derivatives.

I’ve never been able to pin down precisely why the Wirtinger calculus seemed confusing to me while I used it. I thought that the literature was just confusing but maybe there’s other reasons.

I think of them as the things you need for talking about linear approximations to nonholomorphic complex functions without decomposing into real and complex parts. That is, the things you need in front of (z-z_0) and (z-z_0)^* such that

f(z) = f(z_0) + \frac{\partial f}{\partial z} (z - z_0) + \frac{\partial f}{\partial z^*} (z - z_0)^* + O(|z - z_0|^2)

is true. Just asking for this leads to a lot of nice algebraic properties which should make working with them as easy as calculus on \mathbb{R}^2.

But for some reason I’ll admit I had to pay attention when using them. Perhaps it’s just because there’s this second z^* term which always appears and is easy to forget unless you abuse notation and write f(z,z^*) instead of f(z) as a mnemonic device for inserting \frac{\partial f}{\partial z^*}.

It’s interesting that they’re still causing trouble for AD.

2 Likes

I don’t really understand the fuss about Wirtinger derivatives. As far as AD is concerned, simply consider complex numbers as pairs of numbers, and that’s it. Wirtinger derivatives are a rotation of this representation, chosen so that holomorphic functions have one vanishing derivative.

2 Likes

My understanding is that you need Wirtinger derivatives to differentiate real functions of complex variable like

z = \Vert x \Vert^2 = x\, x^\ast,

in other words functions who do not even have have complex derivatives. These function are everywhere in signal processing, look at digital filter synthesis, optimization problems involving power, etc. This is a very nice review:

Candan, Cagatay. “Properly Handling Complex Differentiation in Optimization and Approximation Problems.” IEEE Signal Processing Magazine 36.2 (2019): 117-124.

I’m looking forward to ChainRules.jl supporting them!

No, it’s not necessary. It’s just a set of directional derivatives. Using the derivatives in the real and imaginary components would do just the same thing. The reason why it’s useful is because analytic functions give you a bunch of hard zeros with Wirtinger derivatives (df/dzbar = 0 if f is analytic), and thus the hope would be that the compiler could utilize this to simplify a bunch of expressions better if you’re using the Wirtinger derivatives and many of the functions you encounter are analytic.

4 Likes

Agreed. The fact that they have a special name and a set of folklore around them has confused me several times into thinking they were something other than just very standard derivatives.

I guess the problem is that many people who have a lot of experience in complex analysis are used to only dealing with complex analytic functions and so they think of complex functions as one dimensional even though they should know better since if you asked them they’d know that the complex numbers can be mapped onto a 2D vector space.

I feel like there’s a separate thing hiding in plain sight here which makes the Wirtinger calculus confusing: an entanglement with the algebra of complex numbers (complex multiplication and conjugation). I’ve felt in the past that this somewhat defies a clear geometric interpretation.

To take one term from the linear approximation formula…

\frac{\partial f}{\partial z} \cdot (z - z_0)

there, I’ve added that offending \cdot in explicitly.

While the Wirtinger derivatives are exactly “just directional derivatives” of a 2-vector valued function, their interaction with the complex delta z-z_0 and its conjugate always seemed essentially algebraic to me.

Does anyone have a clear geometric interpretation they would like to share?

You’ve put your finger on an interesting and subtle issue. One view is to look at complex algebra through the language of Geometric Algebra.

Consider a normed vector space in two dimensions with directional basis elements e_1 and e_2. The standard approach would be to define things like dot products and cross / exterior products between these basis vectors and leave it at that. However, another approach is to treat them algebraically.

As basis vectors, we want e_i e_i = e_i^2 = 1. What about e_1 e_2? Define a vector v = v_1 e_1 + v_2 e_2. As a standard normed vector space, we want

v^2 = v_1^2 + v_2^2

but simply multiplying out v^2, we find

v^2 =v_1^2 e_1^2 + v_2^2 e_2^2 + v_1e_1 v_2e_2 + v_2 e_2 v_1 e_1

How do we make these two statements match up in general? We already have e_i e_i = e_i^2 = 1, and if we take v_i to just be numbers, then we need the product e_1e_2 = -e_2 e_1. Hence, our basis vectors need to anti-commute under multiplication.

We can then define the dot product between vectors as

v \cdot u \equiv {1 \over 2}(vu + uv)

and the exterior product as

v \wedge u \equiv {1 \over 2}(vu - uv)

We also notice that

(e_1e_2)^2 = e_1e_2e_1e_2 = -e_1e_1 e_2e_2 = -1

This means that e_1e_2 is an imaginary unit in our vector space!

We can map any vector v = v_1 e_1 + v_2 e_2 onto a complex number via multiplication on the left by e_1:

e_1 v = v_1 + v_2 e_1e_2

Hence, any statement defined on the 2D vector space with unit elements \{e_1, e_2\} can be freely translated into statements about complex numbers spanned by \{1,~e_1e_2\}.

The algebraic qualities of complex numbers are fully equivalent to vector space qualities in 2D.


What’s more, is that this approach allows one to extend a lot of the powerful algebraic properties of complex numbers to higher dimensions. This is why many proponents of geometric algebra say that it unites complex numbers and geometry. A good book to check out if you’re interested is Geometric Algebra for Physicists by Doran and Lasenby. Let me know if you would like help finding a PDF copy.

6 Likes

I find the following chain of thought helpful:

Any nice complex function can be expanded as a Laurent series about 0.

Therefore, any nice complex function f can be written as f(z) = u(z) + v(z*), with u and v analytic, by setting u to the z^n part of the series and v to the (1/z)^n part.

The function z → u(z) has a normal complex derivative u’. If f is written using z, z*, and analytic functions, you can substitute w for z* to get f(z,w) = u(z) + v(w), so u’ is the “partial derivative” of f with respect to z, holding z* constant. This part has a Taylor series in (z-z₀).

The other part, z → v(z*), has complex conjugate derivatives, and a Taylor series in (z-z₀)*.

3 Likes

Thanks @thisrod that is nice; a good algebraic reason why the apparent abuse of notation in writing f(z,z^*) as a stand in for f(z) is actually a very minor abuse.

@Mason thanks for the connection to geometric algebra that’s very cool. It’s long been something I’ve thought I should know more about (but ahem was too lazy / busy writing code to read about in detail :grimacing:). So the geometric product with a fixed vector provides a mapping from vectors to things like v_1 + v_2e_1e_2 which are isomorphic to the complex numbers. That’s an interesting step but there seems to be a bunch more steps to fill in. I guess I should do that reading!

2 Likes