Nesting ForwardDiff inside ReverseDiff?

question
differentiation

#1

Let’s say I have a function f over an input x and a set of parameters w. For example:

f = (x, w) -> sum(x .* w)

I would like to compute the cross-terms of the hessian of f. That is, I would like to compute the gradient of g with respect to w where g itself is the gradient of f w.r.t. x.

For the sake of argument, let’s say that the size of x is small (~10) and w is large (1000 or more). This is not true for the above f, but we can pretend.

One way I could approach this would be to define g using ForwardDiff (since x is small):

g = (x, w) -> ForwardDiff.gradient(x -> f(x, w), x)

and then take the jacobian of g w.r.t. w using ReverseDiff (since w is large):

ReverseDiff.jacobian(w -> g(x, w), w)

but this fails with:

MethodError: Cannot `convert` an object of type ForwardDiff.Dual{2,Float64} to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.

 in increment_deriv! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/propagation.jl:34 [inlined]
 in broadcast_increment_deriv!(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Array{ForwardDiff.Dual{2,Float64},1}, ::CartesianIndex{1}, ::CartesianIndex{1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/propagation.jl:142
 in special_reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#.*,Tuple{Array{ForwardDiff.Dual{2,Float64},1},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{ForwardDiff.Dual{2,Float64},Float64,1,Array{ForwardDiff.Dual{2,Float64},1},Array{Float64,1}},Tuple{CartesianIndex{1},CartesianIndex{1}}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:473
 in reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#.*,Tuple{Array{ForwardDiff.Dual{2,Float64},1},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{ForwardDiff.Dual{2,Float64},Float64,1,Array{ForwardDiff.Dual{2,Float64},1},Array{Float64,1}},Tuple{CartesianIndex{1},CartesianIndex{1}}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/tape.jl:93
 in reverse_pass!(::Array{ReverseDiff.AbstractInstruction,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/tape.jl:87
 in seeded_reverse_pass!(::Array{Float64,2}, ::Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}, ::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}, ::ReverseDiff.JacobianTape{##137#138,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/utils.jl:51
 in seeded_reverse_pass! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/tape.jl:47 [inlined]
 in jacobian!(::Array{Float64,2}, ::ReverseDiff.JacobianTape{##137#138,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}}, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:122
 in jacobian! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:106 [inlined]
 in jacobian(::Function, ::Array{Float64,1}, ::ReverseDiff.JacobianConfig{ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Void}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:25
 in jacobian(::Function, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:23

So, I have two questions:

  1. Am I just doing it wrong? Have I just forgotten everything I know about calculus?
  2. Should I expect running forwarddiff inside reversediff to work? Is there another way to use the tools to get what I want?

Oops, that was more than two questions.

Thanks!


#2
  1. In case you don’t understand the error message it says that dual numbers (Dual type) from ForwardDiff.jl can’t be converted to Float64 that ReverseDiff.jl expects. This is not a surprise, but given that ReverseDiff.jl uses ForwardDiff.jl there may be a backdoor to do what you want. You may want to create an issue on ReverseDiff.jl’s repo to clarify this.
  2. Am I correct that you have functions f and g like this:
f(x, w)
g(x, w) = df(x, w)/dx

and you want to find function h like this:

h(x, w) = dg(x, w)/dw = (df(x, w) / dx) / dw

Can you tell more about f(x, w)? Is it a closed-form algebraic expression or more complex Julia code?


#3

This should actually work - I believe you’ve run into a ReverseDiff bug. Definitely worth filing an issue! It seems like ReverseDiff isn’t promoting element types correctly for operations between, for example, Array{<:Dual} and TrackedArray{<:Float}.

It’s actually quite tough to get these kind of promotion rules right without a tagging system for Duals, so in some cases, ReverseDiff employed hand-wavy hacks for steering the behavior of nested perturbations. While these stopgap solutions didn’t directly cause this particular bug, the lack of a robust solution makes these kind of bugs hard to prevent. Fortunately, now that ForwardDiff’s Dual type supports a tagging system, I can cook up a rigorous implementation of the correct behavior.

As an aside, ReverseDiff’s @forward macro only covers scalar derivatives at the moment, but I’d like it to eventually handle gradients and Jacobians as well. ReverseDiff is undergoing an extensive overhaul this summer, so stay tuned…


#4

@dfdx thanks! I do understand the error in a general sense, but haven’t dug around inside reversediff enough to know why it’s trying that particular conversion. And yes, your explanation of my problem is exactly right (and clearer than my phrasing).

My function f is a multi-layer perceptron, implemented in Julia. The presence of conditionals (inside the ReLU nonlinearities) means that I can’t write it as an algebraic expression. On the other hand, it’s really not that hard to just work out g(x, w) by hand in my particular case.


#5

@jrevels OK, issue filed: https://github.com/JuliaDiff/ReverseDiff.jl/issues/67 Thanks for all your hard work on this!


#6

@jrevels any updates on this? I saw that ReverseDiff updated to support the tagging system in ForwardDiff, but it looks like this particular issue isn’t solved. I’ve spent a couple hours trying to figure out the right promotion mechanism to fix it, but I’m pretty stumped. Do you have any suggestions?