ForwardDiff of function with internal derivatives using CuArrays

Hi All, I am struggling a bit with a ForwardDiff problem on GPU. I would like to determine the gradients of a loss function which internally uses Zygote to calculate gradients. I can get it to work perfectly on CPU but when using GPU (CUDA.jl) I get the following error message:

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#170#171",Float32},Float32,10})
Closest candidates are:
  Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:732
  Float64(::Irrational{:mad_constant}) at irrationals.jl:189

Here is the declaration of my parameters:

#define architectural parameters
nₙ = 20
nᵢ = 2
nₒ = 4

#define initial parameters
W₁ = cu(rand(nₙ, nᵢ))
W₂ = cu(rand(nₙ, nₙ))
W₃ = cu(rand(nₙ, nₙ))
W₄ = cu(rand(nₒ, nₙ))
b₁ = cu(rand(nₙ))
b₂ = cu(rand(nₙ))
b₃ = cu(rand(nₙ))
b₄ = cu(rand(nₒ))

#push input data to CUDA arrays
x0_t0_cu = cu(x0_t0)
xlb_tlb_cu = cu(xlb_tlb)
xub_tub_cu = cu(xub_tub)
xf_tf_cu = cu(xf_tf)
u0_cu = cu(u0)
v0_cu = cu(v0)

Relevant functions (not that important nothing special happening here):

#layer operations
hlayer(W, b, x) = tanh.(W * x .+ b);
olayer(W, b, x) = W * x .+ b;

#forward prop. function
function net(W, b, x)
    x = hlayer(W[1], b[1], x)
    if length(W) <= 2
        x = olayer(W[2], b[2], x)
    else
        for i in 2:(length(W)-1)
            x = hlayer(W[i], b[i], x)
        end
        x = olayer(W[end], b[end], x)
    end
    return x
end

Here is the cost function I wish to differentiate:

#define loss function
function loss(W₁, W₂, W₃, W₄, b₁, b₂, b₃, b₄, 
              x0_t0, xlb_tlb, xub_tub, xf_tf,
              u0, v0)
    
    W = [W₁, W₂, W₃, W₄]
    b = [b₁, b₂, b₃, b₄]
    
    u(x) = net(W, b, x)
    
    #initial conditions
    u₀ = u(x0_t0)
    u₀pred = u₀[1,:]; v₀pred = u₀[2,:];
    
    #lower boundary condition
    uₗ = u(xlb_tlb)
    uₗpred = uₗ[1,:]; vₗpred = uₗ[2,:];
    ∂uₗ∂x = Zygote.gradient(a -> sum(u(a)[1,:]), xlb_tlb)[1][1,:]
    ∂vₗ∂x = Zygote.gradient(a -> sum(u(a)[2,:]), xlb_tlb)[1][1,:]
    
    #upper boundary condition
    uᵤ = u(xub_tub)
    uᵤpred = uᵤ[1,:]; vᵤpred = uᵤ[2,:];
    ∂uᵤ∂x = Zygote.gradient(a -> sum(u(a)[1,:]), xub_tub)[1][1,:]
    ∂vᵤ∂x = Zygote.gradient(a -> sum(u(a)[2,:]), xub_tub)[1][1,:]
    
    loss = mean((u₀pred .- u0).^2) + mean((v₀pred .- v0).^2) +
           mean((uₗpred .- uᵤpred).^2) + mean((vₗpred .- vᵤpred).^2) +
           mean((∂uₗ∂x .- ∂uᵤ∂x).^2) + mean((∂vₗ∂x .- ∂vᵤ∂x).^2)
    return loss
end

Here is the code to call the derivative of the loss function wrt to W1 only:

ForwardDiff.gradient(x -> loss(x, W₂, W₃, W₄, b₁, b₂, b₃, b₄, 
              x0_t0_cu, xlb_tlb_cu, xub_tub_cu, xf_tf_cu,
              u0_cu, v0_cu), W₁)

The ForwardDiff line works fine, if I do not calculate the gradients within the loss function. Any help will be appreciated sorting out this problem. Regards RL

Hi All, I figured it out… sort of. The problem was with the tanh function call. When replaced with the CUDA.tanh. function, it works. But the next problem is that I can’t run the code running without CUDA.allowscalar(true). When I set it to “false” I get the following error.

scalar setindex! is disallowed

Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] assertscalar(::String) at C:\Users\rynol\.julia\packages\GPUArrays\uaFZh\src\host\indexing.jl:41
 [3] setindex! at C:\Users\rynol\.julia\packages\GPUArrays\uaFZh\src\host\indexing.jl:104 [inlined]
 [4] extract_gradient_chunk!(::Type{ForwardDiff.Tag{var"#151#152",Float32}}, ::CuArray{Float64,2}, ::ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float64,10}, ::Int64, ::Int64) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:86
 [5] chunk_mode_gradient(::var"#151#152", ::CuArray{Float32,2}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#151#152",Float32},Float32,10,CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float32,10},2}}) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:152
 [6] gradient(::Function, ::CuArray{Float32,2}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#151#152",Float32},Float32,10,CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float32,10},2}}, ::Val{true}) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:21
 [7] gradient(::Function, ::CuArray{Float32,2}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#151#152",Float32},Float32,10,CuArray{ForwardDiff.Dual{ForwardDiff.Tag{var"#151#152",Float32},Float32,10},2}}) at C:\Users\rynol\.julia\packages\ForwardDiff\sqhTO\src\gradient.jl:17 (repeats 2 times)
 [8] top-level scope at timing.jl:174
 [9] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

I have looked through my code carefully and can’t find the line causing the scalar setindex. Again any help will be appreciated.

Can’t help with the other part, but at least I can tell that with the next version of CUDA.jl (or the current master branch) it won’t be necessary to call CUDA.tanh anymore, you’ll be able to just use tanh.

2 Likes

Never used ForwardDiff, but I’ve run into these errors before.
Looking at gradient.jl at line 86 (which is what your error points to):

function extract_gradient_chunk!(::Type{T}, result, dual, index, chunksize) where {T}
    offset = index - 1
    for i in 1:chunksize
        result[i + offset] = partials(T, dual, i)
    end
    return result
end

This is your problem, I believe. If you modify this function (you can probably create your own with CuArrays as inputs) to use broadcasting instead of a for loop, you should be good.

Thanks so much. I will try this and revert back.

Thanks for the response. I have downloaded Julia 1.6.0 and updated all my packages including CUDA (2.6.2). If I run the code again I get a different error when I don’t include the CUDA.tanh prefix. See below. But at least it is still working.

CUDNNError: CUDNN_STATUS_NOT_INITIALIZED (code 1)

Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\error.jl:19
  [2] cudnnCreate()
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\base.jl:9
  [3] #103
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\CUDNN.jl:62 [inlined]
  [4] get!(default::CUDA.CUDNN.var"#103#106"{CuContext}, d::IdDict{Any, Any}, key::Any)
    @ Base .\iddict.jl:163
  [5] handle()
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\CUDNN.jl:61
  [6] cudnnActivationForward(x::CuArray{Float32, 4}, y::CuArray{Float32, 4}; mode::CUDA.CUDNN.cudnnActivationMode_t, coeff::Bool, reluNanOpt::CUDA.CUDNN.cudnnNanPropagation_t, alpha::Bool, beta::Bool)
    @ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\activation.jl:27
  [7] #79
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\nnlib.jl:239 [inlined]
  [8] materialize
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\nnlib.jl:265 [inlined]
  [9] adjoint
    @ ~\.julia\packages\Zygote\fjuG8\src\lib\broadcast.jl:90 [inlined]
 [10] _pullback
    @ ~\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:57 [inlined]
 [11] _pullback
    @ .\In[53]:4 [inlined]
 [12] _pullback(::Zygote.Context, ::typeof(hlayer), ::CuArray{Float32, 2}, ::CuArray{Float32, 1}, ::CuArray{Float32, 2})
    @ Zygote ~\.julia\packages\Zygote\fjuG8\src\compiler\interface2.jl:0
 [13] _pullback
    @ .\In[53]:10 [inlined]
 [14] _pullback
    @ .\In[54]:6 [inlined]
 [15] _pullback
    @ .\In[54]:16 [inlined]
 [16] _pullback(ctx::Zygote.Context, f::var"#181#192"{var"#u#191"{CuArray{Float32, 2}, CuArray{Float32, 2}, CuArray{Float32, 2}, CuArray{Float32, 1}, CuArray{Float32, 1}, CuArray{Float32, 1}}}, args::CuArray{Float32, 2})
    @ Zygote ~\.julia\packages\Zygote\fjuG8\src\compiler\interface2.jl:0
 [17] _pullback
    @ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:34 [inlined]
 [18] pullback
    @ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:40 [inlined]
 [19] gradient
    @ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:58 [inlined]
 [20] loss(W₁::CuArray{Float32, 2}, W₂::CuArray{Float32, 2}, W₃::CuArray{Float32, 2}, b₁::CuArray{Float32, 1}, b₂::CuArray{Float32, 1}, b₃::CuArray{Float32, 1}, x0_t0::CuArray{Float32, 2}, xlb_tlb::CuArray{Float32, 2}, xub_tub::CuArray{Float32, 2}, xf_tf::CuArray{Float32, 2}, u0::Adjoint{Float32, CuArray{Float32, 1}}, v0::Adjoint{Float32, CuArray{Float32, 1}})
    @ Main .\In[54]:16
 [21] top-level scope
    @ .\timing.jl:210 [inlined]
 [22] top-level scope
    @ .\In[55]:0
 [23] eval
    @ .\boot.jl:360 [inlined]
 [24] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:1094

That shouldn’t happen. Are you running out of GPU memory? CUDNN can be buggy in that case.

The cuDNN module failed when I tested my CUDA package. I have in the meantime removed Julia along with all the packages and installed (Julia 1.6), CUDA 2.3.2, CUDA toolkit 11.1 and accompanying cuDNN lib. Now the Pkg.test("CUDA") works except for the exceptions item :rofl:, and my code now runs. But I can’t run it without the CUDA.tanh. prefix. If I remove the CUDA prefix and run the following:

@time ForwardDiff.gradient(x -> loss(x, W₂, W₃, b₁, b₂, b₃, 
              x0_t0_cu, xlb_tlb_cu, xub_tub_cu, xf_tf_cu,
              u0_cu, v0_cu), W₁)

it gives the following error:

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  (::Type{T})(::T) where T<:Number at boot.jl:760
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  ...

Stacktrace:
  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
    @ Base .\number.jl:7
  [2] cconvert(T::Type, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
    @ Base .\essentials.jl:396
  [3] macro expansion
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\libcudnn.jl:919 [inlined]
  [4] macro expansion
    @ ~\.julia\packages\CUDA\qEV3Y\src\pool.jl:408 [inlined]
  [5] macro expansion
    @ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\error.jl:28 [inlined]
  [6] cudnnSetActivationDescriptor(activationDesc::Ptr{Nothing}, mode::CUDA.CUDNN.cudnnActivationMode_t, reluNanOpt::CUDA.CUDNN.cudnnNanPropagation_t, coef::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})

But I don’t want to waste your time any further, I will build my own implementation of ForwardDiff (maybe complex step diff.) which is CUDA friendly. Thanks again for the posts.

That’s expected, the fix for that only lives on the CUDA.jl development branch, and isn’t part of a released version yet.

1 Like