# ForwardDiff of function with internal derivatives using CuArrays

Hi All, I am struggling a bit with a ForwardDiff problem on GPU. I would like to determine the gradients of a loss function which internally uses Zygote to calculate gradients. I can get it to work perfectly on CPU but when using GPU (CUDA.jl) I get the following error message:

``````MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#170#171",Float32},Float32,10})
Closest candidates are:
Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
Float64(::T) where T<:Number at boot.jl:732
``````

Here is the declaration of my parameters:

``````#define architectural parameters
nₙ = 20
nᵢ = 2
nₒ = 4

#define initial parameters
W₁ = cu(rand(nₙ, nᵢ))
W₂ = cu(rand(nₙ, nₙ))
W₃ = cu(rand(nₙ, nₙ))
W₄ = cu(rand(nₒ, nₙ))
b₁ = cu(rand(nₙ))
b₂ = cu(rand(nₙ))
b₃ = cu(rand(nₙ))
b₄ = cu(rand(nₒ))

#push input data to CUDA arrays
x0_t0_cu = cu(x0_t0)
xlb_tlb_cu = cu(xlb_tlb)
xub_tub_cu = cu(xub_tub)
xf_tf_cu = cu(xf_tf)
u0_cu = cu(u0)
v0_cu = cu(v0)
``````

Relevant functions (not that important nothing special happening here):

``````#layer operations
hlayer(W, b, x) = tanh.(W * x .+ b);
olayer(W, b, x) = W * x .+ b;

#forward prop. function
function net(W, b, x)
x = hlayer(W[1], b[1], x)
if length(W) <= 2
x = olayer(W[2], b[2], x)
else
for i in 2:(length(W)-1)
x = hlayer(W[i], b[i], x)
end
x = olayer(W[end], b[end], x)
end
return x
end
``````

Here is the cost function I wish to differentiate:

``````#define loss function
function loss(W₁, W₂, W₃, W₄, b₁, b₂, b₃, b₄,
x0_t0, xlb_tlb, xub_tub, xf_tf,
u0, v0)

W = [W₁, W₂, W₃, W₄]
b = [b₁, b₂, b₃, b₄]

u(x) = net(W, b, x)

#initial conditions
u₀ = u(x0_t0)
u₀pred = u₀[1,:]; v₀pred = u₀[2,:];

#lower boundary condition
uₗ = u(xlb_tlb)
uₗpred = uₗ[1,:]; vₗpred = uₗ[2,:];
∂uₗ∂x = Zygote.gradient(a -> sum(u(a)[1,:]), xlb_tlb)[1][1,:]
∂vₗ∂x = Zygote.gradient(a -> sum(u(a)[2,:]), xlb_tlb)[1][1,:]

#upper boundary condition
uᵤ = u(xub_tub)
uᵤpred = uᵤ[1,:]; vᵤpred = uᵤ[2,:];
∂uᵤ∂x = Zygote.gradient(a -> sum(u(a)[1,:]), xub_tub)[1][1,:]
∂vᵤ∂x = Zygote.gradient(a -> sum(u(a)[2,:]), xub_tub)[1][1,:]

loss = mean((u₀pred .- u0).^2) + mean((v₀pred .- v0).^2) +
mean((uₗpred .- uᵤpred).^2) + mean((vₗpred .- vᵤpred).^2) +
mean((∂uₗ∂x .- ∂uᵤ∂x).^2) + mean((∂vₗ∂x .- ∂vᵤ∂x).^2)
return loss
end
``````

Here is the code to call the derivative of the loss function wrt to W1 only:

``````ForwardDiff.gradient(x -> loss(x, W₂, W₃, W₄, b₁, b₂, b₃, b₄,
x0_t0_cu, xlb_tlb_cu, xub_tub_cu, xf_tf_cu,
u0_cu, v0_cu), W₁)
``````

The ForwardDiff line works fine, if I do not calculate the gradients within the loss function. Any help will be appreciated sorting out this problem. Regards RL

Hi All, I figured it out… sort of. The problem was with the tanh function call. When replaced with the `CUDA.tanh.` function, it works. But the next problem is that I can’t run the code running without `CUDA.allowscalar(true)`. When I set it to “false” I get the following error.

``````scalar setindex! is disallowed

Stacktrace:
[1] error(::String) at .\error.jl:33
[2] assertscalar(::String) at C:\Users\rynol\.julia\packages\GPUArrays\uaFZh\src\host\indexing.jl:41
[3] setindex! at C:\Users\rynol\.julia\packages\GPUArrays\uaFZh\src\host\indexing.jl:104 [inlined]
[8] top-level scope at timing.jl:174
``````

I have looked through my code carefully and can’t find the line causing the scalar setindex. Again any help will be appreciated.

Can’t help with the other part, but at least I can tell that with the next version of CUDA.jl (or the current master branch) it won’t be necessary to call `CUDA.tanh` anymore, you’ll be able to just use `tanh`.

2 Likes

Never used `ForwardDiff`, but I’ve run into these errors before.
Looking at gradient.jl at line 86 (which is what your error points to):

``````function extract_gradient_chunk!(::Type{T}, result, dual, index, chunksize) where {T}
offset = index - 1
for i in 1:chunksize
result[i + offset] = partials(T, dual, i)
end
return result
end
``````

This is your problem, I believe. If you modify this function (you can probably create your own with CuArrays as inputs) to use broadcasting instead of a for loop, you should be good.

Thanks so much. I will try this and revert back.

Thanks for the response. I have downloaded Julia 1.6.0 and updated all my packages including CUDA (2.6.2). If I run the code again I get a different error when I don’t include the CUDA.tanh prefix. See below. But at least it is still working.

``````CUDNNError: CUDNN_STATUS_NOT_INITIALIZED (code 1)

Stacktrace:
[1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
@ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\error.jl:19
[2] cudnnCreate()
@ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\base.jl:9
[3] #103
@ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\CUDNN.jl:62 [inlined]
[4] get!(default::CUDA.CUDNN.var"#103#106"{CuContext}, d::IdDict{Any, Any}, key::Any)
@ Base .\iddict.jl:163
[5] handle()
@ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\CUDNN.jl:61
[6] cudnnActivationForward(x::CuArray{Float32, 4}, y::CuArray{Float32, 4}; mode::CUDA.CUDNN.cudnnActivationMode_t, coeff::Bool, reluNanOpt::CUDA.CUDNN.cudnnNanPropagation_t, alpha::Bool, beta::Bool)
@ CUDA.CUDNN ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\activation.jl:27
[7] #79
@ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\nnlib.jl:239 [inlined]
[8] materialize
@ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\nnlib.jl:265 [inlined]
[10] _pullback
[11] _pullback
@ .\In[53]:4 [inlined]
[12] _pullback(::Zygote.Context, ::typeof(hlayer), ::CuArray{Float32, 2}, ::CuArray{Float32, 1}, ::CuArray{Float32, 2})
@ Zygote ~\.julia\packages\Zygote\fjuG8\src\compiler\interface2.jl:0
[13] _pullback
@ .\In[53]:10 [inlined]
[14] _pullback
@ .\In[54]:6 [inlined]
[15] _pullback
@ .\In[54]:16 [inlined]
[16] _pullback(ctx::Zygote.Context, f::var"#181#192"{var"#u#191"{CuArray{Float32, 2}, CuArray{Float32, 2}, CuArray{Float32, 2}, CuArray{Float32, 1}, CuArray{Float32, 1}, CuArray{Float32, 1}}}, args::CuArray{Float32, 2})
@ Zygote ~\.julia\packages\Zygote\fjuG8\src\compiler\interface2.jl:0
[17] _pullback
@ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:34 [inlined]
[18] pullback
@ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:40 [inlined]
@ ~\.julia\packages\Zygote\fjuG8\src\compiler\interface.jl:58 [inlined]
[20] loss(W₁::CuArray{Float32, 2}, W₂::CuArray{Float32, 2}, W₃::CuArray{Float32, 2}, b₁::CuArray{Float32, 1}, b₂::CuArray{Float32, 1}, b₃::CuArray{Float32, 1}, x0_t0::CuArray{Float32, 2}, xlb_tlb::CuArray{Float32, 2}, xub_tub::CuArray{Float32, 2}, xf_tf::CuArray{Float32, 2}, u0::Adjoint{Float32, CuArray{Float32, 1}}, v0::Adjoint{Float32, CuArray{Float32, 1}})
@ Main .\In[54]:16
[21] top-level scope
@ .\timing.jl:210 [inlined]
[22] top-level scope
@ .\In[55]:0
[23] eval
@ .\boot.jl:360 [inlined]
[24] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
``````

That shouldn’t happen. Are you running out of GPU memory? CUDNN can be buggy in that case.

The cuDNN module failed when I tested my CUDA package. I have in the meantime removed Julia along with all the packages and installed (Julia 1.6), CUDA 2.3.2, CUDA toolkit 11.1 and accompanying cuDNN lib. Now the `Pkg.test("CUDA")` works except for the `exceptions` item , and my code now runs. But I can’t run it without the `CUDA.tanh.` prefix. If I remove the `CUDA` prefix and run the following:

``````@time ForwardDiff.gradient(x -> loss(x, W₂, W₃, b₁, b₂, b₃,
x0_t0_cu, xlb_tlb_cu, xub_tub_cu, xf_tf_cu,
u0_cu, v0_cu), W₁)
``````

it gives the following error:

``````MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
Closest candidates are:
(::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:200
(::Type{T})(::T) where T<:Number at boot.jl:760
(::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
...

Stacktrace:
[1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
@ Base .\number.jl:7
[2] cconvert(T::Type, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
@ Base .\essentials.jl:396
[3] macro expansion
@ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\libcudnn.jl:919 [inlined]
[4] macro expansion
@ ~\.julia\packages\CUDA\qEV3Y\src\pool.jl:408 [inlined]
[5] macro expansion
@ ~\.julia\packages\CUDA\qEV3Y\lib\cudnn\error.jl:28 [inlined]
[6] cudnnSetActivationDescriptor(activationDesc::Ptr{Nothing}, mode::CUDA.CUDNN.cudnnActivationMode_t, reluNanOpt::CUDA.CUDNN.cudnnNanPropagation_t, coef::ForwardDiff.Dual{ForwardDiff.Tag{var"#45#46", Float32}, Float32, 10})
``````

But I don’t want to waste your time any further, I will build my own implementation of ForwardDiff (maybe complex step diff.) which is CUDA friendly. Thanks again for the posts.

That’s expected, the fix for that only lives on the CUDA.jl development branch, and isn’t part of a released version yet.

1 Like