ForwardDiff of function with internal derivatives using CuArrays

RynoLaubscher · March 29, 2021, 5:52pm

Hi All, I am struggling a bit with a ForwardDiff problem on GPU. I would like to determine the gradients of a loss function which internally uses Zygote to calculate gradients. I can get it to work perfectly on CPU but when using GPU (CUDA.jl) I get the following error message:

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#170#171",Float32},Float32,10})
Closest candidates are:
  Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:732
  Float64(::Irrational{:mad_constant}) at irrationals.jl:189

Here is the declaration of my parameters:

#define architectural parameters
nₙ = 20
nᵢ = 2
nₒ = 4

#define initial parameters
W₁ = cu(rand(nₙ, nᵢ))
W₂ = cu(rand(nₙ, nₙ))
W₃ = cu(rand(nₙ, nₙ))
W₄ = cu(rand(nₒ, nₙ))
b₁ = cu(rand(nₙ))
b₂ = cu(rand(nₙ))
b₃ = cu(rand(nₙ))
b₄ = cu(rand(nₒ))

#push input data to CUDA arrays
x0_t0_cu = cu(x0_t0)
xlb_tlb_cu = cu(xlb_tlb)
xub_tub_cu = cu(xub_tub)
xf_tf_cu = cu(xf_tf)
u0_cu = cu(u0)
v0_cu = cu(v0)

Relevant functions (not that important nothing special happening here):

#layer operations
hlayer(W, b, x) = tanh.(W * x .+ b);
olayer(W, b, x) = W * x .+ b;

#forward prop. function
function net(W, b, x)
    x = hlayer(W[1], b[1], x)
    if length(W) <= 2
        x = olayer(W[2], b[2], x)
    else
        for i in 2:(length(W)-1)
            x = hlayer(W[i], b[i], x)
        end
        x = olayer(W[end], b[end], x)
    end
    return x
end

Here is the cost function I wish to differentiate:

#define loss function
function loss(W₁, W₂, W₃, W₄, b₁, b₂, b₃, b₄, 
              x0_t0, xlb_tlb, xub_tub, xf_tf,
              u0, v0)
    
    W = [W₁, W₂, W₃, W₄]
    b = [b₁, b₂, b₃, b₄]
    
    u(x) = net(W, b, x)
    
    #initial conditions
    u₀ = u(x0_t0)
    u₀pred = u₀[1,:]; v₀pred = u₀[2,:];
    
    #lower boundary condition
    uₗ = u(xlb_tlb)
    uₗpred = uₗ[1,:]; vₗpred = uₗ[2,:];
    ∂uₗ∂x = Zygote.gradient(a -> sum(u(a)[1,:]), xlb_tlb)[1][1,:]
    ∂vₗ∂x = Zygote.gradient(a -> sum(u(a)[2,:]), xlb_tlb)[1][1,:]
    
    #upper boundary condition
    uᵤ = u(xub_tub)
    uᵤpred = uᵤ[1,:]; vᵤpred = uᵤ[2,:];
    ∂uᵤ∂x = Zygote.gradient(a -> sum(u(a)[1,:]), xub_tub)[1][1,:]
    ∂vᵤ∂x = Zygote.gradient(a -> sum(u(a)[2,:]), xub_tub)[1][1,:]
    
    loss = mean((u₀pred .- u0).^2) + mean((v₀pred .- v0).^2) +
           mean((uₗpred .- uᵤpred).^2) + mean((vₗpred .- vᵤpred).^2) +
           mean((∂uₗ∂x .- ∂uᵤ∂x).^2) + mean((∂vₗ∂x .- ∂vᵤ∂x).^2)
    return loss
end

Here is the code to call the derivative of the loss function wrt to W1 only:

ForwardDiff.gradient(x -> loss(x, W₂, W₃, W₄, b₁, b₂, b₃, b₄, 
              x0_t0_cu, xlb_tlb_cu, xub_tub_cu, xf_tf_cu,
              u0_cu, v0_cu), W₁)

The ForwardDiff line works fine, if I do not calculate the gradients within the loss function. Any help will be appreciated sorting out this problem. Regards RL

RynoLaubscher · March 30, 2021, 12:49pm

Hi All, I figured it out… sort of. The problem was with the tanh function call. When replaced with the CUDA.tanh. function, it works. But the next problem is that I can’t run the code running without CUDA.allowscalar(true). When I set it to “false” I get the following error.

scalar setindex! is disallowed

Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] assertscalar(::String) at C:\Users\rynol\.julia\packages\GPUArrays\uaFZh\src\host\indexing.jl:41
 [3] setindex! at C:\Users\rynol\.julia\packages\GPUArrays\uaFZh\src\host\indexing.jl:104 [inlined]
 [4] extract_gradient_chunk!(::Type{ForwardDiff.Tag{var"#151#152",Float32}}, ::CuArray{Float64,2}, ::ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float64,10}, ::Int64, ::Int64) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:86
 [5] chunk_mode_gradient(::var"#151#152", ::CuArray{Float32,2}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#151#152",Float32},Float32,10,CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float32,10},2}}) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:152
 [6] gradient(::Function, ::CuArray{Float32,2}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#151#152",Float32},Float32,10,CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float32,10},2}}, ::Val{true}) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:21
 [7] gradient(::Function, ::CuArray{Float32,2}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#151#152",Float32},Float32,10,CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float32,10},2}}) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:17 (repeats 2 times)
 [8] top-level scope at timing.jl:174
 [9] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

I have looked through my code carefully and can’t find the line causing the scalar setindex. Again any help will be appreciated.

maleadt · March 30, 2021, 12:51pm

Can’t help with the other part, but at least I can tell that with the next version of CUDA.jl (or the current master branch) it won’t be necessary to call CUDA.tanh anymore, you’ll be able to just use tanh.

Ribeiro · March 30, 2021, 2:06pm

Never used ForwardDiff, but I’ve run into these errors before.
Looking at gradient.jl at line 86 (which is what your error points to):

function extract_gradient_chunk!(::Type{T}, result, dual, index, chunksize) where {T}
    offset = index - 1
    for i in 1:chunksize
        result[i + offset] = partials(T, dual, i)
    end
    return result
end

This is your problem, I believe. If you modify this function (you can probably create your own with CuArrays as inputs) to use broadcasting instead of a for loop, you should be good.

RynoLaubscher · March 30, 2021, 7:02pm

Thanks so much. I will try this and revert back.

RynoLaubscher · March 30, 2021, 7:09pm

Thanks for the response. I have downloaded Julia 1.6.0 and updated all my packages including CUDA (2.6.2). If I run the code again I get a different error when I don’t include the CUDA.tanh prefix. See below. But at least it is still working.

CUDNNError: CUDNN_STATUS_NOT_INITIALIZED (code 1)

Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\error.jl:19
  [2] cudnnCreate()
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\base.jl:9
  [3] #103
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\CUDNN.jl:62 [inlined]
  [4] get!(default::CUDA.CUDNN.var"#103#106"{CuContext}, d::IdDict{Any, Any}, key::Any)
    @ Base .\iddict.jl:163
  [5] handle()
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\CUDNN.jl:61
  [6] cudnnActivationForward(x::CuArray{Float32, 4}, y::CuArray{Float32, 4}; mode::CUDA.CUDNN.cudnnActivationMode_t, coeff::Bool, reluNanOpt::CUDA.CUDNN.cudnnNanPropagation_t, alpha::Bool, beta::Bool)
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\activation.jl:27
  [7] #79
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\nnlib.jl:239 [inlined]
  [8] materialize
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\nnlib.jl:265 [inlined]
  [9] adjoint
    @ ~\.julia\packages\Zygote\fjuG8\src\lib\broadcast.jl:90 [inlined]
 [10] _pullback
    @ ~\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:57 [inlined]
 [11] _pullback
    @ .\In[53]:4 [inlined]
 [12] _pullback(::Zygote.Context, ::typeof(hlayer), ::CuArray{Float32, 2}, ::CuArray{Float32, 1}, ::CuArray{Float32, 2})
    @ Zygote ~\.julia\packages\Zygote\fjuG8\src\compiler\interface2.jl:0
 [13] _pullback
    @ .\In[53]:10 [inlined]
 [14] _pullback
    @ .\In[54]:6 [inlined]
 [15] _pullback
    @ .\In[54]:16 [inlined]
 [16] _pullback(ctx::Zygote.Context, f::var"#181#192"{var"#u#191"{CuArray{Float32, 2}, CuArray{Float32, 2}, CuArray{Float32, 2}, CuArray{Float32, 1}, CuArray{Float32, 1}, CuArray{Float32, 1}}}, args::CuArray{Float32, 2})
    @ Zygote ~\.julia\packages\Zygote\fjuG8\src\compiler\interface2.jl:0
 [17] _pullback
    @ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:34 [inlined]
 [18] pullback
    @ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:40 [inlined]
 [19] gradient
    @ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:58 [inlined]
 [20] loss(W₁::CuArray{Float32, 2}, W₂::CuArray{Float32, 2}, W₃::CuArray{Float32, 2}, b₁::CuArray{Float32, 1}, b₂::CuArray{Float32, 1}, b₃::CuArray{Float32, 1}, x0_t0::CuArray{Float32, 2}, xlb_tlb::CuArray{Float32, 2}, xub_tub::CuArray{Float32, 2}, xf_tf::CuArray{Float32, 2}, u0::Adjoint{Float32, CuArray{Float32, 1}}, v0::Adjoint{Float32, CuArray{Float32, 1}})
    @ Main .\In[54]:16
 [21] top-level scope
    @ .\timing.jl:210 [inlined]
 [22] top-level scope
    @ .\In[55]:0
 [23] eval
    @ .\boot.jl:360 [inlined]
 [24] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:1094

maleadt · March 31, 2021, 6:00am

That shouldn’t happen. Are you running out of GPU memory? CUDNN can be buggy in that case.

RynoLaubscher · March 31, 2021, 11:59am

The cuDNN module failed when I tested my CUDA package. I have in the meantime removed Julia along with all the packages and installed (Julia 1.6), CUDA 2.3.2, CUDA toolkit 11.1 and accompanying cuDNN lib. Now the Pkg.test("CUDA") works except for the exceptions item , and my code now runs. But I can’t run it without the CUDA.tanh. prefix. If I remove the CUDA prefix and run the following:

@time ForwardDiff.gradient(x -> loss(x, W₂, W₃, b₁, b₂, b₃, 
              x0_t0_cu, xlb_tlb_cu, xub_tub_cu, xf_tf_cu,
              u0_cu, v0_cu), W₁)

it gives the following error:

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  ...

Stacktrace:
  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
    @ Base .\number.jl:7
  [2] cconvert(T::Type, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
    @ Base .\essentials.jl:396
  [3] macro expansion
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\libcudnn.jl:919 [inlined]
  [4] macro expansion
    @ ~\.julia\packages\CUDA\qEV3Y\src\pool.jl:408 [inlined]
  [5] macro expansion
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\error.jl:28 [inlined]
  [6] cudnnSetActivationDescriptor(activationDesc::Ptr{Nothing}, mode::CUDA.CUDNN.cudnnActivationMode_t, reluNanOpt::CUDA.CUDNN.cudnnNanPropagation_t, coef::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})

But I don’t want to waste your time any further, I will build my own implementation of ForwardDiff (maybe complex step diff.) which is CUDA friendly. Thanks again for the posts.

maleadt · March 31, 2021, 12:13pm

That’s expected, the fix for that only lives on the CUDA.jl development branch, and isn’t part of a released version yet.

Topic		Replies	Views
CUDA.jl Differentiation Error GPU gpu , cuda , differentiation , flux	5	860	January 12, 2021
How to use ForwardDiff with CuArrays GPU question , performance , differentiation , forwarddiff	11	2170	January 26, 2022
MethodError: no method matching getrf! when I try to use GPU to training a universal differential equation with ForwardDiff.jl Optimization (Mathematical) question	5	294	December 6, 2022
Float64 error when using ForwardDiff.gradient for nested functions Optimization (Mathematical)	0	334	November 17, 2020
ForwardDiff failing on simple polynomial New to Julia	1	940	September 28, 2017

ForwardDiff of function with internal derivatives using CuArrays

Related topics