CUDA.jl Differentiation Error

Dear All,

while working on my project (epidemic model, solved using PINN), I tried to port my code to GPU. While function (loss function of my neural network) works and delivers reasonable speed up, while computing gradient of loss function w.r.t. parameters of the neural network, I am getting the following differentiation error. Could somebody please give me some guidance, how to solve this error? Any guidance will be very welcome.

MethodError: no method matching Float32(::ForwardDiff.Dual{Nothing,Float64,1})
Closest candidates are:
  Float32(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float32(::T) where T<:Number at boot.jl:715
  Float32(!Matched::Int8) at float.jl:60

Minimal working example code is here.

#*-*-*-*-*-*-*-*-*-*-* Solve for Value and Policy Functions -*-*-*-*-*-*-*-*-*-*

#(1) Install and initialize packages
using Pkg
using Plots
using Parameters
using LinearAlgebra
using SpecialFunctions
using CUDA
using Flux
using Conda
using PyCall
using Random
using Distributions
using ForwardDiff
using Tullio
using ZygoteRules
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
  d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
  d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

#(2) Build model structure
@with_kw struct Model
    #(A) Structural parameters
    Ξ²::Float64 =0.96^(1/52)
    ℬ::Float64 = 0.25
    Ξ±::Float64 = 39.8352
    Ο†::Float64 = 0.8
    ΞΈ::Float64 = 0.0013
    Ο€1::Float64 = 6.8838e-08
    Ο€2::Float64 = 1.0924e-04
    Ο€3::Float64 = 0.3426
    Ο€r::Float64 = 0.3869
    Ο€d::Float64 = 0.0019
    ΞΌ0::Float64 = 0.0
    #(B) Networks and ADAM setting
    S1::Float64 = 0.0
    S2::Float64 = 1.0
    Ο°::Int64 = 2500
    Ο‘::Int64 = 150
    T::Int64 = 32
    Ξ“::Float64 = 0.001
    Ξ²1::Float64 = 0.9
    Ξ²2::Float64 = 0.99
    Ο΅::Float64 = 10^(-8)
    Ξ½::Float32 = 10^(-5)
    𝚰::Int64 = 500000
    𝛀::Float64 = 0.001
    𝒯::Int64 = 1500
    𝓇::Float64 = 0.9
Mod1 = Model(Ξ“=0.001,𝓇=0.99,T=32)

@unpack T,Ο° = Mod1

#(3) Generate grid
function RnGrid(Model)
    @unpack S1,S2,Ο°,Ο€d = Model
    eGrid = rand(Uniform(S1,S2),Ο°^2,3)
    eGrid = reshape(eGrid,3,Ο°^2)
    tGrid = Array{Float64}(undef,Ο°^2)
    n,m = size(eGrid)
    for i in 1:m
        tGrid[i] = sum(eGrid[:,i])
    Ind = findall(x->x<=1.0,tGrid)
    Grid = eGrid[:,Ind]
    Grid = Grid[:,1:Ο°]
    return Grid
Grid = RnGrid(Mod1)
Grid = Grid |>gpu

#(4) Initialize neural networks
#(4.1) Policy network
bent(x) = (sqrt(x^2+1)-1)/2 + x
Ο† = Chain(Dense(3,T,swish),Dense(T,T,swish),
Dense(T,T,swish),Dense(T,1,swish)) |>gpu

Cr = 1104.8296628335625
Nr = 27.73500981126146
W = 8292.589822461223
Ci = 883.8637302668501
Ni = 27.73500981126146
Ξ§ = 8251.57380479189
cr = CUDA.ones(Ο°)*Cr
w = CUDA.ones(Ο°)*W

c(x) = cr' - x[2,:]'.*Ο†(x)

#(4.2) Value network
Ο‰ = Chain(Dense(3,T,swish),Dense(T,T,swish),
Dense(T,T,swish),Dense(T,1,swish)) |>gpu

𝒱(x) = w' - x[2,:]'.*Ο‰(x)

#(5) Build law of motion
function 𝓗(𝛀,Cs,Ns)
    𝓒 = 𝛀[1,:]' - Ο€1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' - Ο€2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' - Ο€3.*𝛀[1,:]'.*𝛀[2,:]'
    π“˜ = (1-Ο€r-Ο€d).*𝛀[2,:]' + Ο€1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' + Ο€2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' + Ο€3.*𝛀[1,:]'.*𝛀[2,:]'
    𝓑 = 𝛀[3,:]' + Ο€r.*𝛀[2,:]'
    𝞨 = vcat(𝓒,π“˜,𝓑)
    return 𝞨

#(6) Build residual function
@unpack Ξ±,Ξ²,ΞΈ,Ο€1,Ο€2,Ο€3,Ο€d,Ο€r,Ο° = Mod1
function 𝕽(x)
    Cs = c(x)
    Ns = Cs./Ξ±
    V = 𝒱(x)
    𝞨 = 𝓗(x,Cs,Ns)
    π“₯ = 𝒱(𝞨)
    Ο„ = Ο€1*Ci.*Cs.*x[2,:]' + Ο€2*Ni.*Ns.*x[2,:]' + Ο€3*x[2,:]'
    𝓔 = sum((Ξ²*(π“₯.-Ξ§).*(Ο€2*Ni*x[2,:]'+Ξ±*Ο€1*Ci.*x[2,:]') - (Ξ±./Cs-ΞΈ.*Ns)).^2)
    𝓑 = sum((V - log.(Cs) + ΞΈ.*(Ns.^2)./2 - Ξ².*(1 .-Ο„).*π“₯ - Ξ².*Ο„.*Ξ§).^2)
    𝓑 = 𝓔 + 𝓑
    return 𝓑

𝚯 = Flux.params(Ο†,Ο‰)

using BenchmarkTools

@benchmark 𝕽(Grid)

βˆ‡π•½ = Flux.gradient(()->𝕽(Grid),Flux.params(Ο†,Ο‰))


Any idea, why this thing doesn’t work? I am really puzzled because on CPU, I was able to train this model without any problems, and on GPU function itself works, but I am getting differentiation error, and the only change I have made is literally |>gpu. @ChrisRackauckas or somebody else, I would be grateful for any idea/guidance, how to debug this thing.

Hi, maybe it helps if you provide a real minimal working example. The one above is huge, e.g plots won’t be needed to reproduce your problem.

1 Like

Indeed @dhairyagandhi96 was going to look into this one, but if you want someone to look at it quicker it would be very helpful to get this minimized. There’s like 15 packages in here, and I don’t think all of them are needed to recreate the error. If you can delete a bunch of stuff and still get the same error then you’ll get a quicker response, otherwise it needs to wait until someone has the time to minimize it.

1 Like

Yeah a more minimal example would certainly be better, although it’s something I expect would be straightforward to fix

1 Like

@ChrisRackauckas @dhairyagandhi96 Thank you for the response and sorry for too much code. I solved it right now. It was really stupid mistake, I forget that constants are defined as Float64, which caused the differentiation error. The function itself worked, but it crashed with dual numbers (Float32(::ForwardDiff.Dual{Nothing,Float64,1})). Now it works ok.

I would like to ask one thing about performance. How it is possible, that benchmarking of the residual function on GPU vs CPU gave me 10X speed up, but when I benchmark gradient of it, CPU is 2X faster?


1 Like