So these conversations have been starting up recently again on the Slack #autodiff channel and in places like https://github.com/JuliaDiff/ChainRulesCore.jl/issues/159.
I think there’s a lot of misconceptions out there about the derivatives of functions of complex numbers, caused in part by confusing notation and in part by many people’s education about complex numbers focusing too heavily on the holomorphic case. However, in an AD system where we want to deal with general code, we can not limit ourselves to only talking about holomorphic functions.
I’m posting this here for now so we can discuss the ideas and use LaTeX for math, but once these ideas get discussed a bit, it’d be good to turn this conversation into a GitHub issue / PR in ChainRules.jl.
Common background
First of all, we can always treat a function of a complex input f(z) as a function of two real inputs f(x, y) where z = x +i y. We can further write f(x, y) = u(x, y) + i v(x, y) where u and v are real valued, i.e. u(x, y) and v(x, y) are the real and complex parts of f(z).
From this perspective, a \mathbb{C} \rightarrow \mathbb{C} function is a function which takes in two inputs (x,~ y) and gives two outputs (u,~ v). Hence, the first derivatives of f can be expressed if you like as a Jacobian matrix
A holomorphic function is a special class of functions for which the following holds:
which means that there is a symmetry (scarcity) in in the above Jacobian matrix and not all elements need to be computed. This is not true of arbitrary complex functions, but the symmetry does give optimization opportunities and makes our lives easier so I think it’s worth respecting.
Explicity, for a holomorphic function f, the Jacobian can be written
Wirtinger Derivatives
People often like to define two new derivative operators (autodiff people like to call these “Wirtinger derivatives”)
and now for a holomorphic function, f, we have \partial f(z)/ \partial \bar{z} = 0 identically. This is why people sometimes say that a holomorphic function is a function with a single degree of freedom even though it appears to live in a two dimensional space (the complex plane), and they will write non-holomorphic functions as f(z, \bar{z}) to signify that f depends not only on z, but it’s complex conjugate \bar{z}.
In terms of these Wirtinger derivatives, our above Jacobian is written
There are nice geometric interpretations of this representation, but I’ll refrain.
We can do a unitary transformation on this Jacobian matrix to get
(It’s possible we really want the transpose of this thing? I’m not sure yet, open to suggestions.) Note that this is a 2
element complex vector instead of a 2x2
real matrix, but it contains the same amount of information. At this point, you might prefer to call this a gradient instead of a Jacobian.
Currently, ChainRules.jl seems to only handle the holomorphic part ({\partial f / \partial z}) of the Jacobian, but not the anti-holomorphic part ({\partial f / \partial \bar{z}}). For instance,
julia> using ChainRules
julia> rrule(sqrt, 1 + im)[2](1)
(Zero(), 0.3884434935075093 - 0.16089856322639565im)
julia> rrule(abs2, 1 + im)[2](1)
(Zero(), 2 + 2im)
If I understand correctly, this is saying that ChainRules doesn’t provide any indication that abs2
is not a holomorphic function, and in general any derivatives that are obtained from functions involving abs2
will be incorrect.
Proposal?
I think that as nice as the Wirtinger representation can be conceptually, there are real advantage of working in terms of x,~y,~u,~v. It is closer to how Julia actually represents complex numbers internally (we store the real and imaginary parts) and makes it much more explicit that we are really talking about the Jacobian of a function with two inputs and two outputs, not some scalar derivative like the Wirtiger derivative of a holomorphic function might lead you to beleive. In the x,~y,~u~v representation, the property of being Holomorphic is an example of Jacobian scarcity (@ChrisRackauckas is the person in these parts to talk to about scarcity as far as I’m aware).
However, when I started cooking up some smaple code to illustrate this proposal, I realized that writing the rrule
for say cos(z::Complex)
was actually quite ugly in terms of x,~y,~u,~v. I think the answer is just to provide Wirtinger style interfaces for the rules, but don’t represent that data in that way.
Here’s a toy example of the interface I’m imagining. The idea is that ComplexJacobian
s are basically 2x2 StaticArrays, but we also define a constructor to convert a HolomorphicJacobian
to a single Complex
number if that’s needed.
using StaticArrays
abstract type AbstractComplexJacobian{T} <: StaticArray{Tuple{2, 2}, T, 2} end
function Base.show(io::IO, ::MIME"text/plain", cj::AbstractComplexJacobian)
println(io, typeof(cj))
println(io, " ∂u_∂x=$(cj.∂u_∂x) ∂u_∂y=$(cj.∂u_∂y)")
println(io, " ∂v_∂x=$(cj.∂v_∂x) ∂v_∂y=$(cj.∂v_∂y)")
end
Base.size(::AbstractComplexJacobian) = (2, 2)
Base.length(::AbstractComplexJacobian) = 4
struct HolomorphicJacobian{T} <: AbstractComplexJacobian{T}
∂u_∂x::T
∂v_∂x::T
end
Complex(hj::HolomorphicJacobian{T}) where {T<:Real} = hj.∂u_∂x + im*hj.∂v_∂x
struct ComplexJacobian{T} <: AbstractComplexJacobian{T}
∂u_∂x::T
∂v_∂x::T
∂u_∂y::T
∂v_∂y::T
end
function Base.getproperty(hj::HolomorphicJacobian, s::Symbol)
if s === :∂u_∂y
-hj.∂v_∂x
elseif s === :∂v_∂y
hj.∂u_∂x
else
getfield(hj, s)
end
end
function Base.getindex(cj::AbstractComplexJacobian, i::Int)
s = (:∂u_∂x, :∂v_∂x, :∂u_∂y, :∂v_∂y)[i]
getproperty(cj, s)
end
function Base.getindex(cj::AbstractComplexJacobian, i::Int, j::Int)
s = (:∂u_∂x, :∂v_∂x, :∂u_∂y, :∂v_∂y)[i + 2(j-1)]
getproperty(cj, s)
end
function wirtinger(::Type{HolomorphicJacobian})
function (∂f_∂z)
HolomorphicJacobian(real(∂f_∂z), imag(∂f_∂z))
end
end
function wirtinger(::Type{ComplexJacobian})
function (∂f_∂z, ∂f_∂z̄)
ComplexJacobian(real(∂f_∂z) + real(∂f_∂z̄),
imag(∂f_∂z) + imag(∂f_∂z̄),
imag(∂f_∂z) - imag(∂f_∂z̄),
real(∂f_∂z) - real(∂f_∂z̄))
end
end
julia> wirtinger(HolomorphicJacobian)(1 + im)
HolomorphicJacobian{Int64}
∂u_∂x=1 ∂u_∂y=-1
∂v_∂x=1 ∂v_∂y=1
julia> HolomorphicJacobian(1, 2)
HolomorphicJacobian{Int64}
∂u_∂x=1 ∂u_∂y=-2
∂v_∂x=2 ∂v_∂y=1
julia> ComplexJacobian(1, 2, 3, 4)
ComplexJacobian{Int64}
∂u_∂x=1 ∂u_∂y=3
∂v_∂x=2 ∂v_∂y=4
Here are some sample chain rules using this interface:
function ChainRules.rrule(::typeof(cos), z::Complex)
sinz, cosz = sincos(z)
cosz, ∂cos(Δz) = Δz * wirtinger(HolomorphicJacobian)(-sinz)
end
function ChainRules.rrule(::typeof(sin), z::Complex)
sinz, cosz = sincos(z)
sinz, ∂sin(Δz) = Δz * wirtinger(HolomorphicJacobian)(cosz)
end
function ChainRules.rrule(::typeof(abs2), z::Complex)
# abs2(z) = z*z̄
# ∂abs2/∂z = z̄
# ∂abs2/∂z̄ = z
abs2(z), ∂abs2(Δz) = Δz * wirtinger(ComplexJacobian)(conj(z), z)
end
One potential advantage of this representation is that for \mathbb{C}^n \rightarrow \mathbb{C}^m functions, we can easily choose at the level of the chainrule if we want to return a Matrix{<:ComplexJacobian}
or a ComplexJacobian{<:Matrix}
.
Sorry for the long winded, and yet still too naive post. Let me know what you think! (cc @oxinabox)