CUDA.jl Differentiation Error

Honza9723 · January 7, 2021, 9:43pm

Dear All,

while working on my project (epidemic model, solved using PINN), I tried to port my code to GPU. While function (loss function of my neural network) works and delivers reasonable speed up, while computing gradient of loss function w.r.t. parameters of the neural network, I am getting the following differentiation error. Could somebody please give me some guidance, how to solve this error? Any guidance will be very welcome.

MethodError: no method matching Float32(::ForwardDiff.Dual{Nothing,Float64,1})
Closest candidates are:
  Float32(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float32(::T) where T<:Number at boot.jl:715
  Float32(!Matched::Int8) at float.jl:60
  ...
  ...

Minimal working example code is here.

#*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
#*-*-*-*-*-*-*-*-*-*-* Solve for Value and Policy Functions -*-*-*-*-*-*-*-*-*-*
#*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

#(1) Install and initialize packages
using Pkg
Pkg.add("Plots")
Pkg.add("Parameters")
Pkg.add("LinearAlgebra")
Pkg.add("SpecialFunctions")
Pkg.add("CUDA")
Pkg.add("Flux")
Pkg.build("Flux")
Pkg.add("Conda")
Pkg.add("PyCall")
Pkg.add("Random")
Pkg.add("Distributions")
Pkg.add("ForwardDiff")
Pkg.add("KernelAbstractions")
Pkg.add("Tullio")
using Plots
using Parameters
using LinearAlgebra
using SpecialFunctions
using CUDA
using Flux
using Conda
using PyCall
using Random
using Distributions
using ForwardDiff
using Tullio
Random.seed!(100)
Pkg.add("ZygoteRules")
using ZygoteRules
ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
  d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)
ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
  d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

#(2) Build model structure
@with_kw struct Model
    #(A) Structural parameters
    β::Float64 =0.96^(1/52)
    ℬ::Float64 = 0.25
    α::Float64 = 39.8352
    φ::Float64 = 0.8
    θ::Float64 = 0.0013
    π1::Float64 = 6.8838e-08
    π2::Float64 = 1.0924e-04
    π3::Float64 = 0.3426
    πr::Float64 = 0.3869
    πd::Float64 = 0.0019
    μ0::Float64 = 0.0
    #(B) Networks and ADAM setting
    S1::Float64 = 0.0
    S2::Float64 = 1.0
    ϰ::Int64 = 2500
    ϑ::Int64 = 150
    T::Int64 = 32
    Γ::Float64 = 0.001
    β1::Float64 = 0.9
    β2::Float64 = 0.99
    ϵ::Float64 = 10^(-8)
    ν::Float32 = 10^(-5)
    𝚰::Int64 = 500000
    𝛀::Float64 = 0.001
    𝒯::Int64 = 1500
    𝓇::Float64 = 0.9
end
Mod1 = Model(Γ=0.001,𝓇=0.99,T=32)

@unpack T,ϰ = Mod1

#(3) Generate grid
function RnGrid(Model)
    @unpack S1,S2,ϰ,πd = Model
    eGrid = rand(Uniform(S1,S2),ϰ^2,3)
    eGrid = reshape(eGrid,3,ϰ^2)
    tGrid = Array{Float64}(undef,ϰ^2)
    n,m = size(eGrid)
    for i in 1:m
        tGrid[i] = sum(eGrid[:,i])
    end
    Ind = findall(x->x<=1.0,tGrid)
    Grid = eGrid[:,Ind]
    Grid = Grid[:,1:ϰ]
    display(scatter3d(Grid[1,:],Grid[2,:],Grid[3,:],
    label="Grid",xlabel="X",ylabel="Y",zlabel="Z",title="RandomGrid"))
    return Grid
end
Grid = RnGrid(Mod1)
Grid = Grid |>gpu

#(4) Initialize neural networks
#(4.1) Policy network
bent(x) = (sqrt(x^2+1)-1)/2 + x
φ = Chain(Dense(3,T,swish),Dense(T,T,swish),
Dense(T,T,swish),Dense(T,1,swish)) |>gpu

Cr = 1104.8296628335625
Nr = 27.73500981126146
W = 8292.589822461223
Ci = 883.8637302668501
Ni = 27.73500981126146
Χ = 8251.57380479189
cr = CUDA.ones(ϰ)*Cr
w = CUDA.ones(ϰ)*W

c(x) = cr' - x[2,:]'.*φ(x)

#(4.2) Value network
ω = Chain(Dense(3,T,swish),Dense(T,T,swish),
Dense(T,T,swish),Dense(T,1,swish)) |>gpu

𝒱(x) = w' - x[2,:]'.*ω(x)

#(5) Build law of motion
function 𝓗(𝛀,Cs,Ns)
    𝓢 = 𝛀[1,:]' - π1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' - π2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' - π3.*𝛀[1,:]'.*𝛀[2,:]'
    𝓘 = (1-πr-πd).*𝛀[2,:]' + π1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' + π2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' + π3.*𝛀[1,:]'.*𝛀[2,:]'
    𝓡 = 𝛀[3,:]' + πr.*𝛀[2,:]'
    𝞨 = vcat(𝓢,𝓘,𝓡)
    return 𝞨
end

#(6) Build residual function
@unpack α,β,θ,π1,π2,π3,πd,πr,ϰ = Mod1
function 𝕽(x)
    Cs = c(x)
    Ns = Cs./α
    V = 𝒱(x)
    𝞨 = 𝓗(x,Cs,Ns)
    𝓥 = 𝒱(𝞨)
    τ = π1*Ci.*Cs.*x[2,:]' + π2*Ni.*Ns.*x[2,:]' + π3*x[2,:]'
    𝓔 = sum((β*(𝓥.-Χ).*(π2*Ni*x[2,:]'+α*π1*Ci.*x[2,:]') - (α./Cs-θ.*Ns)).^2)
    𝓑 = sum((V - log.(Cs) + θ.*(Ns.^2)./2 - β.*(1 .-τ).*𝓥 - β.*τ.*Χ).^2)
    𝓡 = 𝓔 + 𝓑
    return 𝓡
end

𝚯 = Flux.params(φ,ω)

Pkg.add("BenchmarkTools")
using BenchmarkTools

@benchmark 𝕽(Grid)

∇𝕽 = Flux.gradient(()->𝕽(Grid),Flux.params(φ,ω))

Best,
Honza

Honza9723 · January 10, 2021, 2:09pm

Any idea, why this thing doesn’t work? I am really puzzled because on CPU, I was able to train this model without any problems, and on GPU function itself works, but I am getting differentiation error, and the only change I have made is literally |>gpu. @ChrisRackauckas or somebody else, I would be grateful for any idea/guidance, how to debug this thing.

maxfreu · January 12, 2021, 7:40am

Hi, maybe it helps if you provide a real minimal working example. The one above is huge, e.g plots won’t be needed to reproduce your problem.

ChrisRackauckas · January 12, 2021, 7:50am

Indeed @dhairyagandhi96 was going to look into this one, but if you want someone to look at it quicker it would be very helpful to get this minimized. There’s like 15 packages in here, and I don’t think all of them are needed to recreate the error. If you can delete a bunch of stuff and still get the same error then you’ll get a quicker response, otherwise it needs to wait until someone has the time to minimize it.

dhairyagandhi96 · January 12, 2021, 8:05am

Yeah a more minimal example would certainly be better, although it’s something I expect would be straightforward to fix

Honza9723 · January 12, 2021, 1:28pm

@ChrisRackauckas @dhairyagandhi96 Thank you for the response and sorry for too much code. I solved it right now. It was really stupid mistake, I forget that constants are defined as Float64, which caused the differentiation error. The function itself worked, but it crashed with dual numbers (Float32(::ForwardDiff.Dual{Nothing,Float64,1})). Now it works ok.

I would like to ask one thing about performance. How it is possible, that benchmarking of the residual function on GPU vs CPU gave me 10X speed up, but when I benchmark gradient of it, CPU is 2X faster?

Best,
Honza

Topic		Replies	Views
Code works on CPU but not on GPU Machine Learning cuda , flux	6	990	July 26, 2023
ForwardDiff of function with internal derivatives using CuArrays GPU forwarddiff	8	877	March 31, 2021
Kernel Error while running code on GPU GPU	12	415	January 18, 2024
Flux differentiation error Machine Learning zygote	19	1684	November 19, 2020
Taking the derivative of a scalar loss, that involves a gradient inside, errors on GPU only Machine Learning	7	762	July 13, 2022

CUDA.jl Differentiation Error

Related topics