# Gradient error in Flux model inputs

Hi all, I am trying to construct a loss function in Flux that is composed of a typical mean squared error component, as well as a derivative component. For some reason I get a `Can't differentiate foreigncall expression` error when I try to compute the gradient of the loss function. A minimal working example of the code is shown below. Does anybody know how to fix this/or have a work around?

``````using Flux

m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1))
ps = Flux.params(m)

function loss(x, y)
fitloss = Flux.Losses.mse(m(x), y) # typical loss function

derivativeloss = abs2(gradient(a -> m(a), x)) #  problematic term (only care about derivative of the 3rd input)

return fitloss + derivativeloss
end

xt = rand(3)
yt = rand(1)

loss(xt, yt)
end
``````

Note, this is essentially the same unsolved error as found here. Thanks for helping!

Hi, I would suggest you to use ForwardDiff for computing the inner derivative. In my case (differential equation solved by the neural network), it worked. You just need to add the following rules to Zygote to tackle dual numbers.

``````Pkg.add("ZygoteRules")
using ZygoteRules
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
@assert length(ẋ) == 1
ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials, (ḋ.value,))
end
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ, 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)
``````
1 Like

Thanks for suggesting this! I actually tried using ForwardDiff but it also didn’t work (same error as below). I implemented your suggestion but alas it doesn’t work. Now I get a `Mutating arrays is not supported` error Here is the code I used:

``````using Flux
using ForwardDiff
using ZygoteRules

ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
@assert length(ẋ) == 1
ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials, (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T = d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ, 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T = d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1))
ps = Flux.params(m)

function loss(x, y)
fitloss = Flux.Losses.mse(m(x), y)

#     derivativeloss = abs2(gradient(a -> m(a), x)) # old version using Zygote
derivativeloss = abs2(ForwardDiff.gradient(a -> m(a), x)) # new version using ForwardDiff - still problematic

return fitloss + derivativeloss
end

xt = rand(3)
yt = rand(1)

loss(xt, yt)
end
``````

Any other suggestions (hopeful expression)? Fiy, I am trying to do something resembling physics informed neural networks, but not of the form implemented in NeuralPDEs.jl, but rather as implemented here.

1 Like

I think (not sure) that the problem is caused by your gradient calculation. I would suggest you to define your gradient function outside the loss and broadcast it over the array of values. This is my old code that solves y(x) - y’(x) = 0 by PINN, hope it will help.

``````Ο = Flux.Chain(Dense(1,T,softplus),Dense(T,1,softplus))
ο(t) = 1.0 + t*Ο([t])
dο(t) = ForwardDiff.derivative(ο,t)
𝜣 = Flux.params(Ο)

#(3) Build loss function
function 𝕰(x)
𝕰 = sum((ο.(x) .- dο.(x)).^2)
𝕷 = 𝕰
return 𝕷
end
``````
1 Like

Thanks for all the help! I updated my code and it can now compute the “gradients”, but these all end up being zero… Here is the current version:

``````# need this to get the auto-diff to work
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
@assert length(ẋ) == 1
ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials, (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T = d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ, 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T = d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)
Zygote.refresh()

m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1)) # [u0, k, t] -> u(t)
ps = Flux.params(m)

function get_time_function(x) # forced to do this. FordwardDiff.gradient doesn't work...
mt(t) = m([x[1:2];t])
return mt
end

function loss(x, y) # x and y are arrays
derivativeloss = 0.0f0
for i=1:size(x, 2)

f = get_time_function(x[:, i]) # this feels very clunky...
dmt(t) = ForwardDiff.derivative(f, t) # dNN/dt @ x[:, i]
derivativeloss += Float32(dmt(x))

end

return derivativeloss
end

xts = rand(3, 10)
yts = rand(1, 10)

Update: I created an issue on Flux’s Github page and they solved the one dimensional case in a very elegant way, see here. To solve the multidimensional case one needs to write a fallback rule for `getindex`, see the linked issue. I will update this once I have a more concrete answer for the multidimensional case.