How to forward differentiate a complex function?

Hello,

I have a very basic question, but I haven’t found any solution. I want to differentiate a holomorphic complex function.

My function is

f(z) = [z^3 + exp(im*z), exp(im*z^2) + 1]

And I differentiate it using DifferentiationInferface.jl

f2(z) = f(z[1])

_, j1 = DifferentiationInterface.value_and_jacobian(f2, AutoFiniteDiff(), [1.0+2.0im])

j1

_, j2 = DifferentiationInterface.value_and_jacobian(f2, AutoForwardDiff(), [1.0+2.0im])

j2

But ForwardDiff.jl returns the error

ERROR: ArgumentError: Cannot create a dual over scalar type ComplexF64. If the type behaves as a scalar, define ForwardDiff.can_dual(::Type{ComplexF64}) = true.

What does it do if I set can_dual? Why isn’t it already set? If I do it I then get

ERROR: ArgumentError: Cannot create a dual over scalar type Any. If the type behaves as a scalar, define ForwardDiff.can_dual(::Type{Any}) = true.

If I also set this, I get

ERROR: MethodError: no method matching signbit(::ComplexF64)
The function `signbit` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  signbit(::Float32)
   @ Base floatfuncs.jl:16
  signbit(::Bool)
   @ Base bool.jl:151
  signbit(::Float16)
   @ Base floatfuncs.jl:17

Using Zygote

It works using Zygote.js

function wirtinger(f, x)
  y, back = Zygote.pullback(f, x)

  du = [back([i == j ? 1 : 0 for i in 1:length(y)])[1] for j in 1:length(y)]
  dv = [back([i == j ? im : 0 for i in 1:length(y)])[1] for j in 1:length(y)]
  (conj.(du) + im*conj.(dv))/2, (du + im*dv)/2
end

wirtinger(f, 1+2im)[1]

Complex numbers aren’t supported by ForwardDiff. It’s definitely a deliberate omission, although I’m not the one to explain why. I believe it relates to ambiguity/ill-posedness. Setting can_dual for Complex won’t work, in any case.

One way to input/output complex numbers is as pairs of real numbers. Usually something like (untested, so maybe a small mistake here)

DifferentiationInterface.value_and_jacobian(rc -> reinterpret(Float64, f2(ComplexF64(rc[1], rc[2]))), AutoFiniteDiff(), [1.0,2.0])

(Note that exp(im*x) can be written as cis(x), although there isn’t much difference between the two when x is complex.)

2 Likes

I’m not involved with ForwardDiff, but the reason is that differentiation over complex variables is seldom useful. Pretty much all optimization algorithms deal with a vector of real variables. So it doesn’t matter what you have, real numbers, complex numbers, real matrices, complex matrices, you have to write it as a vector of real numbers.

Another reason is that the derivative you get from considering a complex function as a function over a vector of real numbers has little to do with its complex derivative.

2 Likes

Except in quantum control, which is entirely based on complex vectors and matrices.

You can still rewrite everything in real and imaginary parts if you absolutely have to, but it’s a pain, and numerically inefficient. The lack of support for complex numbers in “classical” optimization / AD has been a major stumbling block for using packages from that domain “out of the box” for quantum control.

(Just mentioning that because both @albertomercurio and myself work in quantum control, and that’s probably the motivation for the question in the first place)

2 Likes

Yes my example is the oversimplification of a more complex case, involving the forward differentiation of the evolution of an open quantum system.

As @goerz were mentioning, quantum systems are described by complex vectors and matrices.

So, is there a way to forward-differentiate a complex function like in the example?

I work in quantum information, and I do need to differentiate complex matrices all the time. The way to do it is rewrite them as a real vector; specifically a vector containing the scaled elements of the upper triangular of a Hermitian matrix.

I am curious about for which application it would be useful to have the complex derivative.

There’s no way the autodiff engine can divine that your function is holomorphic, and the notion of derivative you need in the general case (a 2N by 2N Jacobian) is very different from the one you want for a holomorphic function (an N by N Jacobian). Hence, this has to be handled by some kind of wrapper function or mode that you manually select when you know the function is holomorphic. I suppose such wrappers might be in scope for DifferentiationInterface if someone wants to implement them—@gdalle?

ForwardDiff is built from the ground up around dual numbers both wrapping reals and themselves being reals, so it’s not surprising that just adding a method to can_dual is not enough to make this work transparently. See ForwardDiff.jl/src/dual.jl at f00147ebd3615497b24bafb71c1f037857ca9408 · JuliaDiff/ForwardDiff.jl · GitHub

can_dual(::Type{<:Real}) = true
can_dual(::Type) = false

struct Dual{T,V,N} <: Real
    [...]

FWIW, Enzyme.jl has built-in support for reverse-mode holomorphic differentiation via the ReverseHolomorphic mode, see FAQ · Enzyme.jl. But for forward mode, it seems like you currently have to make the wrappers yourself, like you did for (reverse mode) Zygote.jl.

Here’s a simple wrapper for the scalar case, which should be straightforward to generalize to higher-dimensional cases. The basic idea is that, in pseudo-Julia, holomorphicderivative(f, z) = conj(gradient(real ∘ f, z)), where gradient is a simple “structural” gradient wrt. the Complex struct (equivalently, the \partial/\partial \bar{z} Wirtinger derivative).

julia> using ForwardDiff

julia> function holomorphicderivative(f, z)
           x, y = reim(z)
           dx = ForwardDiff.derivative(x_ -> real(f(complex(x_, y))), x)
           dy = ForwardDiff.derivative(y_ -> real(f(complex(x, y_))), y)
           return complex(dx, -dy)
       end
holomorphicderivative (generic function with 1 method)

julia> holomorphicderivative(log, 1.0 + 1.0im)
0.5 - 0.5im
3 Likes

The GTPSA.jl backend supports complex numbers and works just fine with this:

using GTPSA

f(z) = [z^3 + exp(im*z), exp(im*z^2) + 1]
f2(z) = f(z[1])
_, j1 = DifferentiationInterface.value_and_jacobian(f2, AutoFiniteDiff(), [1.0+2.0im])
_, jg = DifferentiationInterface.value_and_jacobian(f2, AutoGTPSA(), [1.0+2.0im])

To put the quantum control problem as briefly as possible, you have your Schrödinger or Liouville equation of the form

i \frac{\partial}{\partial t} \vert{\Psi(t)}\rangle = \hat{H}(\epsilon(t)) \vert\Psi(t)\rangle

where \vert{\Psi(t)}\rangle is a complex vector, \hat{H} is a complex matrix, and \epsilon(t) is a real-valued control field (you know this, but just to keep things clear for the general audience). You have some real-valued functional J(\vert\Psi(\epsilon(t), t=T\rangle) and you want to find the \epsilon(t) that minimizes J. To calculate the appropriate gradient \partial J/\partial \epsilon(t), you’ll need the complex matrix-calculus derivative \partial J/\partial \langle \Psi(T) \vert (derivative of real scalar w.r.t. to a complex vector). See Quantum Optimal Control via Semi-Automatic Differentiation – Quantum for the full details.

If you want to get the gradient fully with AD, i.e., by simulating the dynamics under the Schrödinger equation with DifferentialEquations or something similar and then plugging the resulting \vert \Psi(T)\rangle into the functional J, you are sad if your AD framework does not support complex numbers (Zygote does, with some bugs due to lack of testing the complex case, but others don’t).

In both cases, you can rewrite the complex vector \vert\Psi\rangle and the complex matrix \hat{H} to objects of twice the dimension (concatenating real and imaginary part), but having to write the conversion routines is annoying, and more importantly, the extended formulation contains redundant information and slows down the simulation pretty significantly. When you’re doing thousands or millions of iterations of gradient descent based on that, with runtimes of multiple days, that’s not ideal, and you really want everything to just work with the straightforward complex objects.

1 Like

DI supports holomorphic Jacobians with FiniteDiff, and we could possibly add other backends if they work, but ForwardDiff certainly doesn’t (as I realized in feat: allow and test holomorphic derivatives by gdalle · Pull Request #687 · JuliaDiff/DifferentiationInterface.jl · GitHub).
As for non-holomorphic complex functions, they aren’t really in scope at the moment (although I think I added the case of real-valued functions for Optim.jl).

2 Likes

To be more precise, DI will assume that your function is holomorphic, because it is the only case where conventions are not ambiguous. This is specified in the “Limitations” docs: Limitations · DifferentiationInterface.jl

I have test scenarios for holomorphic functions, which I only run with FiniteDiff at the moment. We could definitely add more backends to the tests if needed (whether the functionality is supported depends more on the backend itself than it does on DI, except when the operator in question only exists in DI).

1 Like

Thanks for the explanation. So you don’t in fact want the complex derivative, you want to picture complex numbers as pairs of real numbers and take the derivative over them. That makes perfect sense and one could add support to it in ForwardDiff.

2 Likes

Well that’s only the beginning of the story, because the representation of the derivative in the non holomorphic case has several variants, none of which is unambiguously the right one.

2 Likes

What you’re describing here is just a structural gradient of the objective with respect to the complex (dual) vector. You probably want to use reverse mode for that, and any reverse-mode autodiff engine should just give you what you want without any extra fuss, no? It’s been a while since I touched Zygote, but Enzyme certainly will. And if your end-to-end function is actually real-to-real (\epsilon(t) \rightarrow J), AD systems should be able to deal transparently with the intermediate complex numbers—they’re no different from any other struct for chain rule purposes.

Also, since a complex-to-real map can never be holomorphic, this problem is actually orthogonal to the holomorphic derivative desired in the OP.

The doubling-up boilerplate, ambiguities, and overlapping cases only enter the fray when you have non-holomorphic complex-to-complex functions.

I think the structural gradient of complex-to-real functions described by @goerz is equally unambiguous. This and the holomorphic derivative are the two “nice” cases you may encounter when working with derivatives of complex numbers.

1 Like

One would hope, but even for that there are two conventions. I picked the one that makes most sense for optimization when I sought feature parity for Optim.jl to switch to DI. Again, it is only tested on a limited set of backends though, and it could use a second pair of eyes.

1 Like

Thanks everyone for the comments!

Aside from the discussion on complex derivatives, here I write a brief recap on what I understood about Forward differentiation of complex function in Julia.

ForwardDiff.jl

This package, although being the most used one, it seems to not fully support complex numbers. One way is

It seems to work for scalar functions, but what if the function returns a complex vector? How would I extend it to return the Jacobian?

GTPSA.jl

And indeed it works well. However, for what I’m understanding, this method expands the function in series, truncating it up to some order. So it is not exact, right?

Enzyme.jl

This package supports Forward differentiation, and it is also the fastest. The Documentation FAQ is very detailed on this, and the following example seems to work.

Enzyme.jacobian(Forward, f2, [1.0 + 2.0im])
2 Likes

What you’re calling “the” complex derivative is just a constraint on the jacobian of a measure-zero subset of complex → complex functions such that you treat it as single-variable function.

The complex numbers fundamentally are a two-dimensional vector space, and one does not need to necessarily treat them as pairs of real numbers to talk about the full richness of that space. The Wirtinger representation for instance talks about the pair of derivatives \partial / \partial z and \partial / \partial z^* and one can talk about the complex jacobian using that 2D complex basis as well (or any other basis one feels like taking on the complex plane).

While important, I think people wildly over-focus on holomorphic functions to a point where it seems to become their very definition of what a complex function is.

What I’m calling the complex derivative is what everybody calls the complex derivative, including OP. It’s important to use standard nomenclature in order to communicate clearly.

The complex numbers are a two-dimensional real vector space. So yes, you can use different bases to represent them, but it always boils down to representing them as a pair of real numbers.

1 Like

The OP clearly communicated wanting a holomorphic derivative. You’re the only one in this thread that is using the misleading convention of just calling a holomorphic derivative “the” derivative.

No, the derivatives are exact. TPSA is forward-mode autodifferentiation. The TPS number type is basically a Dual number but optimized to make high order AD with many variables much, much faster than you would get by nesting Dual numbers.

1 Like