As DifferentiationInterface.jl gains more traction, I find myself forced to confront the issue of AD with complex numbers. Right now they are not officially supported, but they should be and I wonder what the community needs.
For instance, the derivative of a function \mathbb{C} \to \mathbb{C} can be represented as a matrix in \mathbb{R}^{2 \times 2}, or by a couple in \mathbb{C}^2, but not by a single number (except in the holomorphic case). I need to make such a decision for each operator, i.e. fill the following table and figure out for each cell:
Should I error or return something?
What is the correct convention for the result?
operator
\mathbb{R} \to \mathbb{C}
\mathbb{C} \to \mathbb{R}
\mathbb{C} \to \mathbb{C}
DI.derivative (scalar to scalar)
?
?
?
DI.derivative (scalar to array)
?
?
?
DI.gradient (array to scalar)
?
?
?
DI.jacobian (array to array)
?
?
?
DI.hessian (array to scalar)
?
?
?
I also want to figure out how holomorphic behavior should be signaled to DI, if it can be exploited. Perhaps a function wrapper, or a keyword argument Ă la JAX?
I’m partial to the CR calculus approach where you use a pair of complex numbers \partial f / \partial z and \partial f / \partial \bar{z}.
I find this easiest to use in the optimization context where you have functions \to \mathbb{R} in the end (in which case the two derivatives are conjugates), because it maps very easily onto a gradient that can be passed to optimization software. It also simplifies nicely in the holomorphic case (where the antilinear term \partial f / \partial \bar{z} = 0), and allows you to still think about things as complex numbers (rather than pairs of real and imaginary parts, which is often conceptually awkward).
But I’m not sure how this meshes with the internals of AD systems.
Also I am not sure if you should add in your table the JVP of a real function applied to complex vectors. I need this in bifurcationKit and use dispatch to handle this for now.
Actually using this would require a special Newton method and compiler analysis that do not currently exist. What we would need is:
Something in FunctionProperties.jl/DI for isholomorphic. This in theory can be done by constant propogation of the zeros through chain rules, where if the rules were writteen in the CR calculus approach then you would maybe be able to analyze whether you get a structural zero in the derivative of zbar.
Functionality in NonlinearSolve.jl, where if isholomorphic then create and use a Jacobian of a different size, and do the alternative Newton based on that property.
I’ve wanted such a thing for almost a decade now, but making sure we can properly declare something as holomorphic has been the hard part. The other thing that could be done is it could just be a user-set trait, that defaults to not holomorphic, and thus make the Newton default to the 2x2 form. The downside there is that a lot of “standard” ODE cases would then take a pretty big performance hit, and pretty much no one would know to actually set the trait.
So for now, we do the holomorphic Newton method simply because the holomorphic Newton requires 0 code changes, if you slap complex numbers into a standard Newton code it’s what you get, and if the user wants the other case they should just write their nonlinear solve using a real-valued u0 with the real and imaginary parts, and then inside of their f construct the complex numbers. This path of “try the simple thing, error often, but let the user solve this on their own if they want with a simple workaround” has been the okay equilibrium we’ve sat in.