Issue while using nested ForwardDiff.Dual

Hi!

I’ve been working on some code that makes use of the dual number implementation in ForwardDiff.Dual. I end up using the dual numbers directly rather than through the functions provided by ForwardDiff as I’ve found it allows for more flexibility and possibilities for optimization on my specific application. In this regard, I can set up a Dual number and perform a multiplication with a scalar,

a1 = 1
a2 = 2
a3 = 3
A = ForwardDiff.Dual{:tag1}(a1,a2,a3)

b1 = 4

b1*A

This creates a dual number with a value of 1 and two partials equal to 2 and 3, and correctly returns a Dual with a value of 4 and partials 8 and 12. Now, I want to further make use of the machinery I have to allow the value and individual partials themselves to be Duals rather than floats, for example

a1 = ForwardDiff.Dual{:tag1}(1.0,1.0)
a2 = ForwardDiff.Dual{:tag1}(2.0,1.0)
a3 = ForwardDiff.Dual{:tag1}(3.0,1.0)
A = ForwardDiff.Dual{:tag2}(a1,a2,a3)

b1 = 4

b1*A

which works as I’d want. The broad idea here is that my code can receive a parameter x<:Real, and compute something based on that parameter which internally uses automatic differentiation. If I provide a Float64 value of x then I would get a single float as a result. If instead I provide x as a Dual number ForwardDiff.Dual{:tag1}(a,1.0) (where a is a Float64), then the result would be a Dual number containing the result for a along with the partial derivative with respect to the parameter.

Now my problem is when I try to have b1 itself be a dual number,

a1 = ForwardDiff.Dual{:tag1}(1.0,1.0)
a2 = ForwardDiff.Dual{:tag1}(2.0,1.0)
a3 = ForwardDiff.Dual{:tag1}(3.0,1.0)
A = ForwardDiff.Dual{:tag2}(a1,a2,a3)

b1 = ForwardDiff.Dual{:tag1}(1.0,1.0)

A*b1

I was naively expecting that the result of this operation would be a Dual number with a value a1*b1 (which would be a dual with tag :tag1), while the partials would be equal to a2*b1 and a3*b1 (also duals with tag :tag1). However this throws an exception:

ERROR: Cannot determine ordering of Dual tags tag1 and tag2

Am I abusing too much the intended use of ForwardDiff.Dual in here or is there a way to do this? I’ve made use of the tags in Dual somewhat arbitrarily here, just to differentiate the dual number types, not sure if applying tags smartly could sort this out (looking at the implementation I’m actually unsure how tagging is used internally).

2 Likes

Here is a less abstract example. The function newton_solver below just finds the zero for a*x+b, using c as the first guess. It does a one-step newton solver.

# finds the zero for ax+b, using c as first guess
function newton_solver(a,b,c::T) where {T<:Real}
    c_dual = ForwardDiff.Dual{:tag1}(c,one(T))
    y = a*c_dual+b
    return c-y.value/y.partials[1]
end

# evaluating with floats
newton_solver(1.0,0.0,5.0) # returns 0.0

Now I could try and feed arguments which are Duals themselves. For example, I can provide c as a Dual, where the partial would correspond to dc/dc=1. As expected, the result is a Dual number with a value of zero, and a partial of zero (the result is independent of the first guess).

# using a Dual for the first guess
c = ForwardDiff.Dual{:tag2}(5.0,1.0)
newton_solver(1.0,0.0,c)  # returns Dual{:tag2}(0.0,0.0), partial is zero as expected (solution independent of c)

Things break down if I try to evaluate the function with all parameters being Duals

# using all as dual to get all partials
a = ForwardDiff.Dual{:tag2}(1.0,1.0,0.0,0.0)
b = ForwardDiff.Dual{:tag2}(0.0,0.0,1.0,0.0)
c = ForwardDiff.Dual{:tag2}(5.0,0.0,0.0,1.0)
newton_solver(a,b,c)  # ERROR: Cannot determine ordering of Dual tags tag1 and tag2

which also makes it a bit more clear what the problem is. When the code performs a multiplication between a Dual{:tag1} and a Dual{:tag2}, it does not know what Dual it should produce. This Dual ordering is checked in the operations of ForwardDiff.Dual in the dual.jl file, and in particular there is a definition for the < operation, which is the source of the issue

"""
    ForwardDiff.≺(a, b)::Bool

Determines the order in which tagged `Dual` objects are composed. If true, then `Dual{b}`
objects will appear outside `Dual{a}` objects.

This is important when working with nested differentiation: currently, only the outermost
tag can be extracted, so it should be used in the _innermost_ function.
"""
≺(a,b) = throw(DualMismatchError(a,b))

This will always throw an exception. Don’t know if internally somewhere ForwardDiff creates different definitions for < than can actually allow for nested use of Duals.

And after fiddling with this a bit more, I found out you can do this by abusing a bit the tagging system. First I define a function that creates a ForwardDiff Tag using UUIDs to uniquely identify them, and then runs the ForwardDiff.tagcount function to properly order it. If I use tags generated this way, then things work as I expect.

using ForwardDiff
using UUIDs
function simple_tag()
    x = uuid1().value
    tag = ForwardDiff.Tag{x,nothing}
    ForwardDiff.tagcount(tag) # trigger generated function
    return tag
end
tag1 = simple_tag()
tag2 = simple_tag()

# finds the zero for ax+b, using c as first guess
function newton_solver(a,b,c::T) where {T<:Real}
    c_dual = ForwardDiff.Dual{tag2}(c,one(T))
    y = a*c_dual+b
    return c-y.value/y.partials[1]
end

# using all as dual to get all partials
a = ForwardDiff.Dual{tag1}(2.0,1.0,0.0,0.0)
b = ForwardDiff.Dual{tag1}(3.0,0.0,1.0,0.0)
c = ForwardDiff.Dual{tag1}(3.0,0.0,0.0,1.0)
result = newton_solver(a,b,c)
@show result.value, result.partials

In this case I am finding the zero of a*x+b, with a=2 and b=3. The solution is computed using a one-step newton solver with an initial guess c=5. The result is of course x=-b/a, and for a linear equation one step of a Newton solver gets the exact solution, so it is independent of c. The example below gives me the exact thing I’m looking for, the entries of the partials are respectively dx/da=b/a^2, dx/db=-1/a and dx/dc=0.

Probably I can remove the need to provide an integer by having julia create a unique integer each time

On a closer look, I see this works without the call to tagcount. Which makes me even more clueless about the innards of ForwardDiff!