Flux differentiation error

Dear All,

I am working on a solution of simple differential equation y(x) - y’(x) = 0 using neural network. I am using a two-layer softplus network for parameterization of y(x). To get y’(x), I am using simple Flux.gradient. Gradient itself works well (d𝛰.(xGrid) generate derivative at all grid points for example), however, when I put it into the loss function and run Flux.train! loop, I get following error

" Can’t differentiate foreigncall expression"

Could somebody please give me some guidance how to fix this problem?

Best,
Honza

#(B) Solve simple differential equation
#(1) Generate grid
xGrid = sort(rand(Uniform(-1,1),1,ϰ),dims=1)

#(2) Build neural network and its derivative
Ο = Flux.Chain(Dense(1,16,softplus),Dense(16,1,softplus))
𝛰(x) = Ο([x])[1]
d𝛰(x) = Flux.gradient(𝛰,x)[1]


#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((𝛰.(x) .- d𝛰.(x)).^2)
    𝕭 = (𝛰(1)-1).^2
    𝕷 = 𝕰+𝕭
    return 𝕷
end

𝜣 = Flux.params(Ο)
Data = [xGrid]
opt = ADAM(0.13)

cb = () -> println(𝕰(xGrid))
@time Flux.@epochs 5000 Flux.train!(𝕰,𝜣,Data,opt,cb=cb)

To be more specific, I would like to ask, how to take a derivative of a neural network, such that Flux is able to differentiate it again.

Edit: I tried to get around this problem by computing the derivative using ForwardDiff. I computed the first derivative of neural network using ForwardDiff, and then computed second derivative using Flux.gradient on the ForwardDiff derivative as a trial. It worked without any problems. However, when I plugged the ForwardDiff derivative into the loss function (loss function itself works) and tried to optimize it using Flux.train! loop, I get following error:

TypeError: in typeassert, expected Float32, got ForwardDiff.Dual{Nothing,Float32,1}
in top-level scope at base\util.jl:175
in macro expansion at Juno\n6wyj\src\progress.jl:119
in macro expansion at Flux\Fj3bt\src\optimise\train.jl:122
in  at Flux\Fj3bt\src\optimise\train.jl:79
in #train!#12 at Flux\Fj3bt\src\optimise\train.jl:81
in macro expansion at Juno\n6wyj\src\progress.jl:119 
in macro expansion at Flux\Fj3bt\src\optimise\train.jl:92 
in update! at Flux\Fj3bt\src\optimise\train.jl:31
in update! at Flux\Fj3bt\src\optimise\train.jl:25
in apply! at Flux\Fj3bt\src\optimise\optimisers.jl:175
in materialize! at base\broadcast.jl:823
in copyto! at base\broadcast.jl:864 
in copyto! at base\broadcast.jl:909 
in macro expansion at base\simdloop.jl:77 
in macro expansion at base\broadcast.jl:910 
in setindex! at base\multidimensional.jl:545 
in setindex! at base\array.jl:828 

Code

#(B) Solve simple differential equation
#(1) Generate grid
xGrid = sort(rand(Uniform(-1,1),1,ϰ),dims=1)

#(2) Build neural network and its derivative
Ο = Flux.Chain(Dense(1,16,softplus),Dense(16,1,softplus))
ο(t) = Ο([t])[1]
dο(t) = ForwardDiff.derivative(ο,t)[1]
ddο(t) = Flux.gradient(dο,t)[1]

#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((ο.(x) .- dο.(x)).^2)
    𝕭 = (ο(1)-1).^2
    𝕷 = 𝕰+𝕭
    return 𝕷
end

𝕰(xGrid)


𝜣 = Flux.params(Ο)
Data = [xGrid]
opt = ADAM(0.13)

cb = () -> println(𝕰(xGrid))
@time Flux.@epochs 5000 Flux.train!(𝕰,𝜣,Data,opt,cb=cb)

Edit 2: I made ddο(t) work using fix suggested by @ChrisRackauckas
https://github.com/SciML/DiffEqFlux.jl/blob/v1.7.0/src/DiffEqFlux.jl#L53-L64

# ForwardDiff integration

ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
  d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
  d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

However, I can’t still get gradient of LossFunction, I am getting following error:
MethodError: no method matching Float32(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float64},Float64,1})

Does somebody have an idea, how to fix it (use ForwardDiff/other autodiff in loss function, and still be able to train it in Flux)?

You’re mixing Float64 and Float32: you might want to make everything Float32.

1 Like

Dear Chris, thank you. So, I should convert xGrid to Float32? Is it a problem of input (xGrid), or do I need to fix something inside loss function?

Yes, you probably want to convert what comes out of rand to a Float32 (or make it directly sample Float32s).

1 Like

I tried that, and I still get this error

MethodError: no method matching Float32(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float64,1})

Code

#(B) Solve simple differential equation
#(1) Generate grid
xGrid = sort(rand(Uniform(-1,1),1,ϰ),dims=1)

xGrid = convert(Array{Float32},xGrid)

#(2) Build neural network and its derivative
Ο = Flux.Chain(Dense(1,16,softplus),Dense(16,1,softplus))
ο(t) = Ο([t])[1]
dο(t) = ForwardDiff.derivative(ο,t)
ddο(t) = Flux.gradient(dο,t)[1]

dο(5)

ddο(5)

ddο.(xGrid)

#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((ο.(x) .- dο.(x)).^2)
    𝕭 = (ο(1)-1).^2
    𝕷 = 𝕰+𝕭
    return 𝕷
end

𝕰(xGrid)

𝜣 = Flux.params(Ο)
Data = [xGrid]
opt = ADAM(0.13)

cb = () -> println(𝕰(xGrid))
@time Flux.@epochs 5000 Flux.train!(𝕰,𝜣,Data,opt,cb=cb)

Hi, sorry for spamming with this problem. I differentiated my loss function with respect to parameters of the neural network and get the following gradient. So, Zygote was able to differentiate ForwardDiff stuff.

∇𝕰 = Flux.gradient(()->𝕰(xGrid),𝜣)
∇𝕰.grads

However, the gradient had the following structure (include dual numbers, hence update of parameters fails in Flux.train! loop). It looks like the fix suggested by @ChrisRackauckas doesn’t work. Any idea, how to fix this?

IdDict{Any, Any} with 5 entries
16×1 Array{Float32,2}:
 => 16×1 Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1},2}:
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.2634682,1271.0961)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.036139756,-196.31708)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.2237638,-1756.2749)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.14364085,-776.0406)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.0034865336,-18.079405)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.27216268,-1725.1829)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.07260824,479.56946)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.020844292,-140.64598)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.23060206,1787.3832)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.17264761,-1272.322)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.33930257,-1820.1713)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.056477264,478.3624)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.3520169,-1696.228)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.039859835,272.59702)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.19189215,-1461.739)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.08595074,513.59863)
Vector{Float32} with 16 elements
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
 => 
Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1}} with 16 elements
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.26234788,357.27512)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.03616306,-26.888998)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.22294879,373.919)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.14253888,-111.39714)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.0034794048,-3.4679422)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.27468497,44.10873)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.07365812,-31.0434)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.020888403,11.870704)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.23005879,-363.95084)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.17167568,205.64714)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.33812273,-275.50446)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.05754064,-129.68915)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.35195476,-477.36105)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.04028002,-26.315512)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.19163737,276.66525)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.08639604,17.339998)
1×16 Array{Float32,2}:
 -0.407974  0.0630915  0.566018  0.249386  …  -0.0877669  0.470995  -0.165193
 => 1×16 Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1},2}:
Vector{Float32} with 1 element
0.00
 => 
Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1}} with 1 element
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(1.009891,-46.851746)
:(Main.xGrid) => 1×2500 Array{Float32,2}:

Do I understand it correctly, that it computed gradient correctly, and I need just to “de-dualize” it? Is there some easy way, how to do it, ideally such that Flux.train! loop would work?

Hi, any guidance on what to do with this problem? Sorry for spamming with this, but I desperately need advice. :smiley: I think it should be something simple, but I can’t find the solution. @ChrisRackauckas @MikeInnes

Yes, GalacticOptim.jl actually hard codes the workaround:

@ChrisRackauckas Thank you very much! I did something like that on my own, just very inefficient. :smiley: So, I just need to plug my loss function into GalacticOptim?

I think so. :crossed_fingers:. If it doesn’t, let me know.

@ChrisRackauckas Thank you very much! I will try it! Is there some example, how to put user-defined loss function with Flux neural networks inside to GalacticOptim.jl?

PS: I make it work using my implementation of ADAM that manually de-dualize gradient. Thank you for your guidance!

No problem!

And for the future, see this update on where our AD tools are going: DifferentialEquations - Derivatives in ODE function/ nesting AD - #2 by ChrisRackauckas

1 Like

Hi Chris, I tried to install GalacticOptim.jl, and I get the following error. Is there some tutorial on how to solve this?

Also, I would like to ask, how to optimize loss function which contains Flux neural network with those implicit parameters. Simply collect them using Flux.params?

ERROR: Unsatisfiable requirements detected for package Compat [34da2185]:
 Compat [34da2185] log:
 ├─possible versions are: [1.0.0-1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0-1.5.1, 2.0.0, 2.1.0, 2.2.0-2.2.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0-3.3.1, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 3.8.0, 3.9.0-3.9.1, 3.10.0, 3.11.0, 3.12.0, 3.13.0, 3.14.0, 3.15.0, 3.16.0, 3.17.0, 3.18.0, 3.19.0, 3.20.0, 3.21.0, 3.22.0, 3.23.0] or uninstalled
 ├─restricted by compatibility requirements with BlackBoxOptim [a134a8b2] to versions: [1.0.0-1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0-1.5.1, 2.0.0, 2.1.0, 2.2.0-2.2.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0-3.3.1, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 3.8.0, 3.9.0-3.9.1, 3.10.0, 3.11.0, 3.12.0, 3.13.0, 3.14.0, 3.15.0, 3.16.0, 3.17.0, 3.18.0, 3.19.0, 3.20.0, 3.21.0, 3.22.0, 3.23.0]
 │ └─BlackBoxOptim [a134a8b2] log:
 │   ├─possible versions are: [0.4.0, 0.5.0] or uninstalled
 │   └─restricted to versions * by an explicit requirement, leaving only versions [0.4.0, 0.5.0]
 ├─restricted by compatibility requirements with TensorFlow [1d978283] to versions: [1.0.0-1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0-1.5.1, 2.0.0, 2.1.0, 2.2.0-2.2.1]
 │ └─TensorFlow [1d978283] log:
 │   ├─possible versions are: [0.10.2, 0.10.4, 0.11.0] or uninstalled
 │   └─restricted to versions * by an explicit requirement, leaving only versions [0.10.2, 0.10.4, 0.11.0]   
 └─restricted by compatibility requirements with Optim [429524aa] to versions: [3.2.0, 3.3.0-3.3.1, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 3.8.0, 3.9.0-3.9.1, 3.10.0, 3.11.0, 3.12.0, 3.13.0, 3.14.0, 3.15.0, 3.16.0, 3.17.0, 3.18.0, 3.19.0, 3.20.0, 3.21.0, 3.22.0, 3.23.0] — no versions left
   └─Optim [429524aa] log:
     ├─possible versions are: [0.15.3, 0.16.0, 0.17.0-0.17.2, 0.18.0-0.18.1, 0.19.0-0.19.7, 0.20.0-0.20.6, 0.21.0, 0.22.0, 1.0.0, 1.1.0, 1.2.0] or uninstalled
     ├─restricted to versions * by an explicit requirement, leaving only versions [0.15.3, 0.16.0, 0.17.0-0.17.2, 0.18.0-0.18.1, 0.19.0-0.19.7, 0.20.0-0.20.6, 0.21.0, 0.22.0, 1.0.0, 1.1.0, 1.2.0]
     └─restricted by compatibility requirements with GalacticOptim [a75be94c] to versions: [0.22.0, 1.0.0, 1.1.0, 1.2.0]
       └─GalacticOptim [a75be94c] log:
         ├─possible versions are: [0.1.0-0.1.3, 0.2.0-0.2.2, 0.3.0-0.3.1, 0.4.0-0.4.1] or uninstalled        
         └─restricted to versions * by an explicit requirement, leaving only versions [0.1.0-0.1.3, 0.2.0-0.2.2, 0.3.0-0.3.1, 0.4.0-0.4.1]

I think TensorFlow.jl might be implicitly upper bounding compat.jl? @oxinabox

Though this is an entirely diferent topic so it shouldn’t be in the same thread.

1 Like

Having TensorFlow.jl in your dependency tree seems like a mistake.
Its pretty stale these days. Like it works, but it is bound to an old version of LibTensorFlow, and its just less fun to use than Flux etc.

How should I kill it?

] rm TensorFlow i guess.
I recommend reading the package managers docs.
https://julialang.github.io/Pkg.jl/v1/

2 Likes

Thank you, it works!