Hi all, I am trying to construct a loss function in Flux that is composed of a typical mean squared error component, as well as a derivative component. For some reason I get a Can't differentiate foreigncall expression error when I try to compute the gradient of the loss function. A minimal working example of the code is shown below. Does anybody know how to fix this/or have a work around?
using Flux
m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1))
ps = Flux.params(m)
function loss(x, y)
fitloss = Flux.Losses.mse(m(x), y) # typical loss function
derivativeloss = abs2(gradient(a -> m(a)[1], x)[1][3]) # problematic term (only care about derivative of the 3rd input)
return fitloss + derivativeloss
end
xt = rand(3)
yt = rand(1)
gs = gradient(ps) do
loss(xt, yt)
end
Note, this is essentially the same unsolved error as found here. Thanks for helping!
Hi, I would suggest you to use ForwardDiff for computing the inner derivative. In my case (differential equation solved by the neural network), it worked. You just need to add the following rules to Zygote to tackle dual numbers.
Pkg.add("ZygoteRules")
using ZygoteRules
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
@assert length(ẋ) == 1
ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)
Thanks for suggesting this! I actually tried using ForwardDiff but it also didn’t work (same error as below). I implemented your suggestion but alas it doesn’t work. Now I get a Mutating arrays is not supported error Here is the code I used:
using Flux
using ForwardDiff
using ZygoteRules
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
@assert length(ẋ) == 1
ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T = d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T = d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)
m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1))
ps = Flux.params(m)
function loss(x, y)
fitloss = Flux.Losses.mse(m(x), y)
# derivativeloss = abs2(gradient(a -> m(a)[1], x)[1][3]) # old version using Zygote
derivativeloss = abs2(ForwardDiff.gradient(a -> m(a)[1], x)[3]) # new version using ForwardDiff - still problematic
return fitloss + derivativeloss
end
xt = rand(3)
yt = rand(1)
gs = gradient(ps) do
loss(xt, yt)
end
Any other suggestions (hopeful expression)? Fiy, I am trying to do something resembling physics informed neural networks, but not of the form implemented in NeuralPDEs.jl, but rather as implemented here.
I think (not sure) that the problem is caused by your gradient calculation. I would suggest you to define your gradient function outside the loss and broadcast it over the array of values. This is my old code that solves y(x) - y’(x) = 0 by PINN, hope it will help.
Ο = Flux.Chain(Dense(1,T,softplus),Dense(T,1,softplus))
ο(t) = 1.0 + t*Ο([t])[1]
dο(t) = ForwardDiff.derivative(ο,t)
𝜣 = Flux.params(Ο)
#(3) Build loss function
function 𝕰(x)
𝕰 = sum((ο.(x) .- dο.(x)).^2)
𝕷 = 𝕰
return 𝕷
end
Thanks for all the help! I updated my code and it can now compute the “gradients”, but these all end up being zero… Here is the current version:
# need this to get the auto-diff to work
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
@assert length(ẋ) == 1
ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T = d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T = d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)
Zygote.refresh()
m = Chain(Dense(3, 10, relu), Dense(10, 10, relu), Dense(10, 1)) # [u0, k, t] -> u(t)
ps = Flux.params(m)
function get_time_function(x) # forced to do this. FordwardDiff.gradient doesn't work...
mt(t) = m([x[1:2];t])[1]
return mt
end
function loss(x, y) # x and y are arrays
derivativeloss = 0.0f0
for i=1:size(x, 2)
f = get_time_function(x[:, i]) # this feels very clunky...
dmt(t) = ForwardDiff.derivative(f, t) # dNN/dt @ x[:, i]
derivativeloss += Float32(dmt(x[3]))
end
return derivativeloss
end
xts = rand(3, 10)
yts = rand(1, 10)
gs = gradient(ps) do
loss(xts, yts)
end # all these gradients end up being zero...
Zygote/Flux keeps on throwing errors no matter how I change the scheme… However, there seem to be a number of issues on Flux’s Github where people have run into similar problems, see here and here. I will ask there as well.
Update: I created an issue on Flux’s Github page and they solved the one dimensional case in a very elegant way, see here. To solve the multidimensional case one needs to write a fallback rule for getindex, see the linked issue. I will update this once I have a more concrete answer for the multidimensional case.