Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)?

Sleort · November 11, 2019, 9:11am

Say I have some function f: \mathbb{R}^m \times \mathbb{R} \to \mathbb{R}^n, i.e. f(\theta, x) is some vector for a given set of parameters \theta and a scalar x. I would now like to calculate the gradient \nabla_\theta L(\theta, x) of a loss function on the form L(\theta, x) = g\left(\frac{\partial f(\theta, x)}{\partial x} \right) \in \mathbb{R} (where I need to be able to access the intermediate result of \frac{\partial f(\theta, x)}{\partial x} \in \mathbb{R}^n in order to calculate L(\theta, x)).

Is it possible to accomplish this within Flux? Here’s a minimal not working example (MNWE?) of what I’d like to do:

using Flux, ForwardDiff

f = Chain(x -> fill(x, 3), Dense(3, 3, softplus))
df(x) = ForwardDiff.derivative(f, x)

x = rand()
f(x) #Works
df(x) #Works
g = gradient(() -> sum(df(x)), params(f)) #Fails

which fails with the following stack trace:

ERROR: setindex! not defined for ForwardDiff.Partials{1,Float64}
Stacktrace:
 [1] error(::String, ::Type) at ./error.jl:42
 [2] error_if_canonical_setindex(::IndexLinear, ::ForwardDiff.Partials{1,Float64}, ::Int64) at ./abstractarray.jl:1082
 [3] setindex! at ./abstractarray.jl:1073 [inlined]
 [4] (::getfield(Zygote, Symbol("##976#978")){ForwardDiff.Partials{1,Float64},Tuple{Int64}})(::Float64) at /home/troels/.julia/packages/Zygote/8dVxG/src/lib/array.jl:32
 [5] (::getfield(Zygote, Symbol("##2585#back#980")){getfield(Zygote, Symbol("##976#978")){ForwardDiff.Partials{1,Float64},Tuple{Int64}}})(::Float64) at /home/troels/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [6] partials at /home/troels/.julia/packages/ForwardDiff/yPcDQ/src/dual.jl:96 [inlined]
 [7] (::getfield(Zygote, Symbol("##153#154")){typeof(∂(partials)),Tuple{Tuple{Nothing},Tuple{Nothing}}})(::Float64) at /home/troels/.julia/packages/Zygote/8dVxG/src/lib/lib.jl:142
 [8] (::getfield(Zygote, Symbol("##283#back#155")){getfield(Zygote, Symbol("##153#154")){typeof(∂(partials)),Tuple{Tuple{Nothing},Tuple{Nothing}}}})(::Float64) at /home/troels/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [9] extract_derivative at /home/troels/.julia/packages/ForwardDiff/yPcDQ/src/dual.jl:101 [inlined]
 [10] #72 at /home/troels/.julia/packages/ForwardDiff/yPcDQ/src/derivative.jl:81 [inlined]
 [11] (::typeof(∂(#72)))(::Float64) at /home/troels/.julia/packages/Zygote/8dVxG/src/compiler/interface2.jl:0
 [12] (::getfield(Zygote, Symbol("##1104#1108")))(::typeof(∂(#72)), ::Float64) at /home/troels/.julia/packages/Zygote/8dVxG/src/lib/array.jl:134
 [13] (::getfield(Base, Symbol("##3#4")){getfield(Zygote, Symbol("##1104#1108"))})(::Tuple{typeof(∂(#72)),Float64}) at ./generator.jl:36
 [14] iterate at ./generator.jl:47 [inlined]
 [15] collect at ./array.jl:606 [inlined]
 [16] map at ./abstractarray.jl:2155 [inlined]
 [17] (::getfield(Zygote, Symbol("##1103#1107")){Array{typeof(∂(#72)),1}})(::FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}) at /home/troels/.julia/packages/Zygote/8dVxG/src/lib/array.jl:134
 [18] (::getfield(Zygote, Symbol("##2842#back#1109")){getfield(Zygote, Symbol("##1103#1107")){Array{typeof(∂(#72)),1}}})(::FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}) at /home/troels/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [19] derivative at /home/troels/.julia/packages/ForwardDiff/yPcDQ/src/derivative.jl:81 [inlined]
 [20] df at ./REPL[3]:1 [inlined]
 [21] (::typeof(∂(df)))(::FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}) at /home/troels/.julia/packages/Zygote/8dVxG/src/compiler/interface2.jl:0
 [22] #5 at ./REPL[7]:1 [inlined]
 [23] (::typeof(∂(#5)))(::Float64) at /home/troels/.julia/packages/Zygote/8dVxG/src/compiler/interface2.jl:0
 [24] (::getfield(Zygote, Symbol("##38#39")){Zygote.Params,Zygote.Context,typeof(∂(#5))})(::Float64) at /home/troels/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:101
 [25] gradient(::Function, ::Zygote.Params) at /home/troels/.julia/packages/Zygote/8dVxG/src/compiler/interface.jl:47
 [26] top-level scope at REPL[7]:1

I can, of course, approximate the desired result by using a finite difference approximation of some kind, like

Δf(x, δ = 0.001) = (f(x + δ) - f(x)) ./ δ

Δf(x) #Works
g = gradient(() -> sum(Δf(x)), params(f)) #Works

but it would be great if this wasn’t necessary…

Ideas? Suggestions?

Some package information:

(v1.2) pkg> st Flux
    Status `~/.julia/environments/v1.2/Project.toml`
  [5ae59095] Colors v0.9.6
  [3a865a2d] CuArrays v1.4.2
  [587475ba] Flux v0.9.0 #master (https://github.com/FluxML/Flux.jl.git)
  [e5e0dc1b] Juno v0.7.2
  [872c559c] NNlib v0.6.0 #master (https://github.com/FluxML/NNlib.jl.git)
  [189a3867] Reexport v0.2.0
  [2913bbd2] StatsBase v0.32.0
  [e88e6eb3] Zygote v0.4.1

(v1.2) pkg> st ForwardDiff
    Status `~/.julia/environments/v1.2/Project.toml`
  [163ba53b] DiffResults v0.0.4
  [f6369f11] ForwardDiff v0.10.6
  [276daf66] SpecialFunctions v0.7.2
  [90137ffa] StaticArrays v0.11.1

Note that I’m on Flux#master.

jbmuir · March 9, 2020, 11:30pm

I am also interested in this question; it seems like it comes up frequently but the answer also in flux! Bad puns notwithstanding, are there any updates in regards to this working with Zygote?

simeonschaub · March 9, 2020, 11:50pm

There is Zygote.hessian that might work for your usecase. Doing multiple AD passes is something that’s currently actively being worked on and will probably become more straightforward when ChainRules.jl becomes more widely adapted by AD packages.

ChrisRackauckas · March 10, 2020, 5:41am

DiffEqFlux.jl has a fix for this in terms of

github.com

SciML/DiffEqFlux.jl/blob/v1.7.0/src/DiffEqFlux.jl#L53-L64


      
          # ForwardDiff integration
          
          
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
            @assert length(ẋ) == 1
            ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
          end
          
          
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
            d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
          
          
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
            d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

plus the sciml_train training loop handles a related bug in Zygote by manually de-dualizing the gradient:

github.com

SciML/DiffEqFlux.jl/blob/v1.7.0/src/train.jl#L175-L179


      
          if eltype(grad) <: ForwardDiff.Dual

            G .= getindex.(ForwardDiff.partials.(grad),1)

          else

            G .= grad

          end

So if you use DiffEqFlux.sciml_train you’ll see this magically works, and feel free to take the fix. More discussion is here: Differentiation of Dual numbers by MikeInnes · Pull Request #510 · FluxML/Zygote.jl · GitHub .

FWIW, this could be fixed by a Zygote forward mode, and one is in the works: Forward Mode by MikeInnes · Pull Request #503 · FluxML/Zygote.jl · GitHub

Topic		Replies	Views
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	558	April 8, 2022
Gradient error in Flux model inputs Machine Learning question , flux , zygote	5	1324	January 13, 2021
Issue with Zygote over ForwardDiff.derivative Machine Learning	10	1246	January 21, 2024
Flux, higher order derivatives and forward mode General Usage	4	1225	May 5, 2020
Applying Reverse-Mode AD to ForwardDiff Derivatives/Gradients New to Julia flux , forwarddiff , chainrulescore	0	368	May 3, 2022

Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)?

Related topics