Hey there, so I’m playing around with making my own symbolic math package and I’m attempting to do forward mode automatic differentiation on my symbolic functions. The problem is that I am struggling to find a clean way around a problem called perturbation confusion.
First, some background:
Forward Mode Automatic Differentiation
To solve certain math problems we’ve invented new types of numbers. One example is the imaginary numbers which include a unit imaginary i which is defined such that i^2 = -1. Similarly, to solve differentiation problems it’s useful to use what people in computing call Dual Numbers defined by a new types of number \epsilon with the algebraic property that \epsilon^2 = 0. It turns out that for a sufficiently well behaved function f, if one evaluates f(x + \epsilon)$ they will find that equals f(x) + \epsilon f'(x) where f'(x) is the first derivative of f. If one constructs in the computer a type which has the above properties and defines multiple dispatch for all the relevant mathematical functions so that they know how to deal with dual numbers then one can defined a derivative operator as
function D(f::Function)
x -> (f(x + ϵ)).infinitesimal
end
where z.infinitesimal
just extracts the terms in z
which are multiplied by ϵ
.
Hence, the derivative of a function f(x)
is found simply by calling D(f)(x)
Where am I at?
I have managed so far to define multiple dispatch methods on most of the basic mathematical functions I care about so that they can accept symbolic arguments and dual number arguments and have (as far as I can tell so far) correct behaviour and I have a rudimentary simplification system that is able to apply mathematical identities and simplify certain symbolic mathematical expressions. Things are working pretty well for the early stages.
My dual number type currently is defined basically like
struct Dual_Number
real::Union{Symbolic, Number}
infinitesimal::Union{Symbolic, Number}
end
and for convenience I just define ϵ = Dual_Number(0,1)
Great so whats the problem?
Perturbation Confusion
Consider trying to take a second derivative with this system.
(D^2)(f) == D(D(f))
== D(x -> (f(x + ϵ)).infinitesimal)
== y -> ((x -> (f(x + ϵ)).infinitesimal)(y + ϵ)).infinitesimal
== y -> 0
Uh-oh!
The problem here is that when I try to take a second derivative it has to go through the first derivative and so it essentially is trying to feed in 2ϵ
to f
and then take the infinitesimal part twice which is always going to be zero since the first time you extract the infinitesimal part you get just the real part which by definition has no infinitesimal part. This is known as perturbation confusion because the derivative system doesn’t know which infinitesimal part its supposed to extract.
After encountering this problem, the main strategy I’ve found in the literature for avoiding perturbation confusion ( though it’s also not perfect) has been to ‘tag’ each dual number and then only extract the dual number components you meant to extract each time you take the derivative.
ie. instead of ϵ we want to write \epsilon_1, \epsilon_2, \epsilon_3, … and then change our arithmetic rules such that \epsilon_i \epsilon_j = 0 only if i = j and otherwise we just leave \epsilon_i \epsilon_j alone.
Then if we imagine implementing the proper computer implementation of ϵᵢ
that obeys the above properties then one could simply implement a derivative operator as
function D(f::Function)
global tag += 1
x -> extract_infinitesimal(f(x + ϵ(tag)), tag)
end
where extract_infinitesimal(z, tag)
only extracts the infinitesimal part of z
associated with the index tag
and epsilon(tag)
generates a new dual number with tag index tag
. This should then be immune to most forms of perturbation confusion.
The Question
Okay, sorry for the long winded preamble. My problem is that I am really struggling to see how best to implement these tagged dual numbers. I’ve tried to look at the source code for various packages that use this tagging method and have failed to understand whats going on. My intuition would be that the Dual_Number
types should have a slot that carries tag information but then I’m seeing that I’ll need a slot in the Dual_Number
type that corresponds to each tag because I need to know the difference between x + 2ϵ₁ + ϵ₂
and x + ϵ₁ + 2ϵ₂
. Further, there’s the problem of cross terms so I need some what of dealing with terms like x + 2ϵ₁ϵ₂
without setting that infinitesimal part to zero and have a way to extract out x + 2ϵ₁
or x + 2ϵ₂
as needed.
Maybe there is some magic one can do with Julia’s type system that generates a new type for each combination of tags and would be friendly to my notions of multiple dispatch on dual numbers but I’m really not very well versed in the subtleties of the type system.
Maybe theres a better way I haven’t thought of? If anyone has taken the time to read this and has a suggestion I’d love to hear it.