Issue with Zygote over ForwardDiff.derivative

jlmaccal · November 2, 2021, 7:03pm

I’m having some trouble getting Zygote over ForwardDiff.derivative to work.

I’m going to refer to this prior post. The following code used to fail with ERROR: setindex! not defined for ForwardDiff.Partials{1,Float64}. The fix was to define some additional adjoints, as @ChrisRackauckas showed in DiffEqFlux.jl.

using Flux, ForwardDiff

f = Chain(x -> fill(x, 3), Dense(3, 3, softplus))
df(x) = ForwardDiff.derivative(f, x)

x = rand()
f(x) #Works
df(x) #Works
gs = gradient(() -> sum(df(x)), params(f)) #Fails

However, the code above now runs without the additional adjoints, but the gradients returned are nothing.

One of my codes used a similar Zygote over ForwardDiff.derivative idea, but it no longer trains as all of the gradients are nothing. Something seems to have changed, but I don’t know where to start. Unfortunately, I don’t have the old Project or Manifest files, so I don’t know what versions I was using.

ChrisRackauckas · November 2, 2021, 7:04pm

what’s your full MWE?

jlmaccal · November 2, 2021, 7:27pm

The example above fails in the way I describe. gs should have the gradients wrt to params(f), but has nothing instead.

jlmaccal · November 2, 2021, 7:30pm

My example is more along the lines of this, computing the dot product between the gradient and a vector, which is equivalent to the directional derivative.

using Flux
using ForwardDiff

net = Chain(Dense(2, 128, relu), Dense(128, 128, relu), Dense(128, 1))
p, re = Flux.destructure(net)

x = randn(Float32, 2, 128)
dx = randn(Float32, 2, 128)

grads = Flux.gradient(p -> sum(ForwardDiff.derivative(h -> re(p)(x + h*dx), 0.0f0)), p)

This used to fail without defining some extra adjoints as in DiffEqFlux. It now just gives nothing

ChrisRackauckas · November 2, 2021, 7:48pm

If you add the DiffEqFlux adjoints does it work?

jlmaccal · November 2, 2021, 7:53pm

No, that doesn’t make a difference.

ChrisRackauckas · November 2, 2021, 7:59pm

Interesting. @mcabbott would you know something about what might’ve changed?

mcabbott · November 2, 2021, 8:10pm

Yes this won’t work, sadly. The warning from Zygote.forwarddiff is:

Note that the function `f` will *drop gradients* for any closed-over values.

and that’s what’s being used here. That is, it’s forward-over-forward, and takes derivatives only with respect to the explicit parameter, not to anything closed over (since ForwardDiff is unaware of those).

Making it give errors when f closes over anything would be better. Making it actually work… I’m not sure, might be possible? Does DiffEqFlux.jl have (pirate?) code which handles this?

ChrisRackauckas · November 2, 2021, 8:15pm

Yes.

ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
  d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
  d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

All of our pirate code is: https://github.com/SciML/DiffEqFlux.jl/blob/v1.44.0/src/DiffEqFlux.jl#L60-L74 and we should upstream some of it.

jlmaccal · November 2, 2021, 8:51pm

Is there any possibility of a workaround? This used to work ~1 year ago.

The only other approach I have been able to make work is ReverseDiff over Zygote, but for some reason this is super slow (I’ll create another thread about this).

facusapienza · January 21, 2024, 4:18pm

Has it been any update or progress in this line? I recently encountered a similar problem that I posted in Nested and different AD methods altogether: How to add AD calculations inside my loss function when using neural differential equations? that I am trying to make work. Thanks!

Topic		Replies	Views
Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)? Machine Learning question , flux	3	1465	March 10, 2020
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	582	April 8, 2022
Gradient error in Flux model inputs Machine Learning question , flux , zygote	5	1354	January 13, 2021
Using Flux: gradient on DifferentialEquations: solve results in an error New to Julia question , diffeq , differentiation	11	1033	May 8, 2023
Zygote @adjoint with matrices Machine Learning	7	1366	December 14, 2019

Issue with Zygote over ForwardDiff.derivative

Related topics