Applying Reverse-Mode AD to ForwardDiff Derivatives/Gradients

codyrucker · May 3, 2022, 8:52pm

I am building a neural net and want to be able to differentiate the network with respect to the model input . I would then like to compute the gradient(with respect to the network weights and bias) of this derivative. I can get the input derivatives by doing the following:

using Flux
using ForwardDiff

u(x, W, b) = sum(σ.(W*x + b))

# ∂ₓu
uₓ(x, W, b) = ForwardDiff.gradient(z->u(z, W, b), x)[1]

# ∂ₓₓu
uₓₓ(x, W, b) = ForwardDiff.gradient(z->uₓ(z, W, b), x)[1]

Now, I want to find the gradient w.r.t the network parameters. I can do this if I use ForwardDiff,

W₀ = rand(1,2)
b₀ = rand(1)
x₀ = rand(2)

∇W₀ = ForwardDiff.gradient(W₀) do W₀
      uₓₓ(x₀, W₀, b₀)
end

but for this computation I would like to use reverse-mode AD. However, using the Flux gradient function returns a grad object filled with ‘nothing’. I know that this question has surfaced in some form or another before but I have not been able to figure out how to get this working. I am still very new to all of this but the fact that I can compute the gradient with ForwardDiff makes me think it can be done with reverse-mode AD. I have seen suggestions to define a custom chain rule but I am struggling to do so. (I am not familiar with ChainRules.jl and also unsure how to write a rrule that can deal with a ForwardDiff call) I know that @ChrisRackauckas suggested looking over the ZygoteRules given at

github.com

SciML/DiffEqFlux.jl/blob/e68e6a78c98a7e8628d14825240456c6c2011170/src/DiffEqFlux.jl#L53-L64


      
          # ForwardDiff integration
          
          
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
            @assert length(ẋ) == 1
            ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
          end
          
          
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
            d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
          
          
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
            d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

but my lack of experience is making it hard to translate this to chainrules. Anyways, if anyone has any suggestions I would greatly appreciate it. Thanks in advance!

Topic		Replies	Views
Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)? Machine Learning question , flux	3	1446	March 10, 2020
Automatic Differentiation Machine Learning	11	3289	February 11, 2019
Loss function with derivative of output New to Julia question , flux , machine-learning , reversediff , neural-network	3	1603	September 30, 2021
Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?) Machine Learning forwarddiff , neural-network	2	213	December 24, 2023
Autograd in Flux General Usage	14	1288	March 5, 2021

Applying Reverse-Mode AD to ForwardDiff Derivatives/Gradients

Related topics