Gradient error in Flux model inputs

Elmo · January 12, 2021, 10:11pm

Hi all, I am trying to construct a loss function in Flux that is composed of a typical mean squared error component, as well as a derivative component. For some reason I get a Can't differentiate foreigncall expression error when I try to compute the gradient of the loss function. A minimal working example of the code is shown below. Does anybody know how to fix this/or have a work around?

using Flux

m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1))
ps = Flux.params(m)

function loss(x, y)
    fitloss = Flux.Losses.mse(m(x), y) # typical loss function

    derivativeloss = abs2(gradient(a -> m(a)[1], x)[1][3]) #  problematic term (only care about derivative of the 3rd input)
    
    return fitloss + derivativeloss
end

xt = rand(3)
yt = rand(1)

gs = gradient(ps) do
    loss(xt, yt)
end

Note, this is essentially the same unsolved error as found here. Thanks for helping!

Honza9723 · January 12, 2021, 10:34pm

Hi, I would suggest you to use ForwardDiff for computing the inner derivative. In my case (differential equation solved by the neural network), it worked. You just need to add the following rules to Zygote to tackle dual numbers.

Pkg.add("ZygoteRules")
using ZygoteRules
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
  d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
  d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

Elmo · January 12, 2021, 10:56pm

Thanks for suggesting this! I actually tried using ForwardDiff but it also didn’t work (same error as below). I implemented your suggestion but alas it doesn’t work. Now I get a Mutating arrays is not supported error Here is the code I used:

using Flux
using ForwardDiff
using ZygoteRules

ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T = d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T = d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1))
ps = Flux.params(m)

function loss(x, y)
    fitloss = Flux.Losses.mse(m(x), y)

#     derivativeloss = abs2(gradient(a -> m(a)[1], x)[1][3]) # old version using Zygote 
    derivativeloss = abs2(ForwardDiff.gradient(a -> m(a)[1], x)[3]) # new version using ForwardDiff - still problematic
    
    return fitloss + derivativeloss
end

xt = rand(3)
yt = rand(1)

gs = gradient(ps) do
    loss(xt, yt)
end

Any other suggestions (hopeful expression)? Fiy, I am trying to do something resembling physics informed neural networks, but not of the form implemented in NeuralPDEs.jl, but rather as implemented here.

Honza9723 · January 12, 2021, 11:04pm

I think (not sure) that the problem is caused by your gradient calculation. I would suggest you to define your gradient function outside the loss and broadcast it over the array of values. This is my old code that solves y(x) - y’(x) = 0 by PINN, hope it will help.

Ο = Flux.Chain(Dense(1,T,softplus),Dense(T,1,softplus))
ο(t) = 1.0 + t*Ο([t])[1]
dο(t) = ForwardDiff.derivative(ο,t)
𝜣 = Flux.params(Ο)

#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((ο.(x) .- dο.(x)).^2)
    𝕷 = 𝕰
    return 𝕷
end

Elmo · January 13, 2021, 1:28pm

Thanks for all the help! I updated my code and it can now compute the “gradients”, but these all end up being zero… Here is the current version:

# need this to get the auto-diff to work
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T = d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T = d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)
Zygote.refresh()

m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1)) # [u0, k, t] -> u(t)
ps = Flux.params(m)

function get_time_function(x) # forced to do this. FordwardDiff.gradient doesn't work...
    mt(t) = m([x[1:2];t])[1]
    return mt
end

function loss(x, y) # x and y are arrays
    derivativeloss = 0.0f0 
    for i=1:size(x, 2)
        
        f = get_time_function(x[:, i]) # this feels very clunky... 
        dmt(t) = ForwardDiff.derivative(f, t) # dNN/dt @ x[:, i]
        derivativeloss += Float32(dmt(x[3]))

    end
    
    return derivativeloss
end

xts = rand(3, 10)
yts = rand(1, 10)

gs = gradient(ps) do
    loss(xts, yts)
end # all these gradients end up being zero...

Zygote/Flux keeps on throwing errors no matter how I change the scheme… However, there seem to be a number of issues on Flux’s Github where people have run into similar problems, see here and here. I will ask there as well.

Elmo · January 13, 2021, 4:13pm

Update: I created an issue on Flux’s Github page and they solved the one dimensional case in a very elegant way, see here. To solve the multidimensional case one needs to write a fallback rule for getindex, see the linked issue. I will update this once I have a more concrete answer for the multidimensional case.

Topic		Replies	Views
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	558	April 8, 2022
Flux differentiation error Machine Learning zygote	19	1685	November 19, 2020
How to use gradient of neural network as the loss function? Machine Learning question	13	2739	March 23, 2021
AD Troubles in Flux and Unusual Loss Machine Learning flux , zygote , fluxtraining	1	117	November 24, 2024
Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)? Machine Learning question , flux	3	1448	March 10, 2020

Gradient error in Flux model inputs

Related topics