So these conversations have been starting up recently again on the Slack #autodiff channel and in places like https://github.com/JuliaDiff/ChainRulesCore.jl/issues/159.

I think there’s a lot of misconceptions out there about the derivatives of functions of complex numbers, caused in part by confusing notation and in part by many people’s education about complex numbers focusing too heavily on the holomorphic case. However, in an AD system where we want to deal with *general* code, we can not limit ourselves to only talking about holomorphic functions.

I’m posting this here for now so we can discuss the ideas and use LaTeX for math, but once these ideas get discussed a bit, it’d be good to turn this conversation into a GitHub issue / PR in ChainRules.jl.

# Common background

First of all, we can always treat a function of a complex input f(z) as a function of two real inputs f(x, y) where z = x +i y. We can further write f(x, y) = u(x, y) + i v(x, y) where u and v are real valued, i.e. u(x, y) and v(x, y) are the real and complex parts of f(z).

From this perspective, a \mathbb{C} \rightarrow \mathbb{C} function is a function which takes in two inputs (x,~ y) and gives two outputs (u,~ v). Hence, the first derivatives of f can be expressed if you like as a Jacobian matrix

A holomorphic function is a special class of functions for which the following holds:

which means that there is a symmetry (scarcity) in in the above Jacobian matrix and not all elements need to be computed. **This is not true of arbitrary complex functions**, but the symmetry does give optimization opportunities and makes our lives easier so I think it’s worth respecting.

Explicity, for a holomorphic function f, the Jacobian can be written

## Wirtinger Derivatives

People often like to define two new derivative operators (autodiff people like to call these “Wirtinger derivatives”)

and now for a holomorphic function, f, we have \partial f(z)/ \partial \bar{z} = 0 identically. This is why people sometimes say that a holomorphic function is a function with a **single** degree of freedom even though it appears to live in a two dimensional space (the complex plane), and they will write non-holomorphic functions as f(z, \bar{z}) to signify that f depends not only on z, but it’s complex conjugate \bar{z}.

In terms of these Wirtinger derivatives, our above Jacobian is written

There are nice geometric interpretations of this representation, but I’ll refrain.

We can do a unitary transformation on this Jacobian matrix to get

(It’s possible we really want the transpose of this thing? I’m not sure yet, open to suggestions.) Note that this is a `2`

element complex vector instead of a `2x2`

real matrix, but it contains the same amount of information. At this point, you might prefer to call this a gradient instead of a Jacobian.

Currently, ChainRules.jl seems to only handle the holomorphic part ({\partial f / \partial z}) of the Jacobian, but not the anti-holomorphic part ({\partial f / \partial \bar{z}}). For instance,

```
julia> using ChainRules
julia> rrule(sqrt, 1 + im)[2](1)
(Zero(), 0.3884434935075093 - 0.16089856322639565im)
julia> rrule(abs2, 1 + im)[2](1)
(Zero(), 2 + 2im)
```

If I understand correctly, this is saying that ChainRules doesn’t provide any indication that `abs2`

is not a holomorphic function, and in general any derivatives that are obtained from functions involving `abs2`

will be incorrect.

# Proposal?

I think that as nice as the Wirtinger representation can be conceptually, there are real advantage of working in terms of x,~y,~u,~v. It is closer to how Julia actually represents complex numbers internally (we store the real and imaginary parts) and makes it much more explicit that we are really talking about the **Jacobian** of a function with two inputs and two outputs, not some scalar derivative like the Wirtiger derivative of a holomorphic function might lead you to beleive. In the x,~y,~u~v representation, the property of being Holomorphic is an example of Jacobian *scarcity* (@ChrisRackauckas is the person in these parts to talk to about scarcity as far as I’m aware).

However, when I started cooking up some smaple code to illustrate this proposal, I realized that writing the `rrule`

for say `cos(z::Complex)`

was actually quite ugly in terms of x,~y,~u,~v. I think the answer is just to provide Wirtinger style interfaces for the rules, but don’t represent that data in that way.

Here’s a toy example of the interface I’m imagining. The idea is that `ComplexJacobian`

s are basically 2x2 StaticArrays, but we also define a constructor to convert a `HolomorphicJacobian`

to a single `Complex`

number if that’s needed.

```
using StaticArrays
abstract type AbstractComplexJacobian{T} <: StaticArray{Tuple{2, 2}, T, 2} end
function Base.show(io::IO, ::MIME"text/plain", cj::AbstractComplexJacobian)
println(io, typeof(cj))
println(io, " ∂u_∂x=$(cj.∂u_∂x) ∂u_∂y=$(cj.∂u_∂y)")
println(io, " ∂v_∂x=$(cj.∂v_∂x) ∂v_∂y=$(cj.∂v_∂y)")
end
Base.size(::AbstractComplexJacobian) = (2, 2)
Base.length(::AbstractComplexJacobian) = 4
struct HolomorphicJacobian{T} <: AbstractComplexJacobian{T}
∂u_∂x::T
∂v_∂x::T
end
Complex(hj::HolomorphicJacobian{T}) where {T<:Real} = hj.∂u_∂x + im*hj.∂v_∂x
struct ComplexJacobian{T} <: AbstractComplexJacobian{T}
∂u_∂x::T
∂v_∂x::T
∂u_∂y::T
∂v_∂y::T
end
function Base.getproperty(hj::HolomorphicJacobian, s::Symbol)
if s === :∂u_∂y
-hj.∂v_∂x
elseif s === :∂v_∂y
hj.∂u_∂x
else
getfield(hj, s)
end
end
function Base.getindex(cj::AbstractComplexJacobian, i::Int)
s = (:∂u_∂x, :∂v_∂x, :∂u_∂y, :∂v_∂y)[i]
getproperty(cj, s)
end
function Base.getindex(cj::AbstractComplexJacobian, i::Int, j::Int)
s = (:∂u_∂x, :∂v_∂x, :∂u_∂y, :∂v_∂y)[i + 2(j-1)]
getproperty(cj, s)
end
function wirtinger(::Type{HolomorphicJacobian})
function (∂f_∂z)
HolomorphicJacobian(real(∂f_∂z), imag(∂f_∂z))
end
end
function wirtinger(::Type{ComplexJacobian})
function (∂f_∂z, ∂f_∂z̄)
ComplexJacobian(real(∂f_∂z) + real(∂f_∂z̄),
imag(∂f_∂z) + imag(∂f_∂z̄),
imag(∂f_∂z) - imag(∂f_∂z̄),
real(∂f_∂z) - real(∂f_∂z̄))
end
end
```

```
julia> wirtinger(HolomorphicJacobian)(1 + im)
HolomorphicJacobian{Int64}
∂u_∂x=1 ∂u_∂y=-1
∂v_∂x=1 ∂v_∂y=1
julia> HolomorphicJacobian(1, 2)
HolomorphicJacobian{Int64}
∂u_∂x=1 ∂u_∂y=-2
∂v_∂x=2 ∂v_∂y=1
julia> ComplexJacobian(1, 2, 3, 4)
ComplexJacobian{Int64}
∂u_∂x=1 ∂u_∂y=3
∂v_∂x=2 ∂v_∂y=4
```

Here are some sample chain rules using this interface:

```
function ChainRules.rrule(::typeof(cos), z::Complex)
sinz, cosz = sincos(z)
cosz, ∂cos(Δz) = Δz * wirtinger(HolomorphicJacobian)(-sinz)
end
function ChainRules.rrule(::typeof(sin), z::Complex)
sinz, cosz = sincos(z)
sinz, ∂sin(Δz) = Δz * wirtinger(HolomorphicJacobian)(cosz)
end
function ChainRules.rrule(::typeof(abs2), z::Complex)
# abs2(z) = z*z̄
# ∂abs2/∂z = z̄
# ∂abs2/∂z̄ = z
abs2(z), ∂abs2(Δz) = Δz * wirtinger(ComplexJacobian)(conj(z), z)
end
```

One potential advantage of this representation is that for \mathbb{C}^n \rightarrow \mathbb{C}^m functions, we can easily choose at the level of the chainrule if we want to return a `Matrix{<:ComplexJacobian}`

or a `ComplexJacobian{<:Matrix}`

.

Sorry for the long winded, and yet still too naive post. Let me know what you think! (cc @oxinabox)