CUDNNError when using Flux within a Task

I have been trying to release a Flux backend for my AlphaZero.jl library, but I am encountering a mysterious CUDNN error (CUDNN_STATUS_INTERNAL_ERROR).

Here is some info:

  • The error happens consistently at every run, after about a minute. It never happens exactly at the same time and always happens during an inference query of the same kind that had been run successfully thousands of times before.
  • The error only happens when network inference is run in an asynchronous Julia Task, which is the case when using multiple MCTS workers. (Tasks are not run in parallel as there is a global lock. Also, only one task is doing inference and accesses the GPU).
  • The error only happens when convolution layers are used (at least, it does not happen when replacing the network by an MLP.)
  • The error happens with both CuArray’s splitting pool and binned pool (although it happens slightly sooner with the binned pool on average).
  • The error happens with Flux but not with Knet.
  • When the error happens, the GPU memory is not necessarily full (there was at least 200MB free in all my experiments).

Does anyone have an hypothesis about what’s happening here? Any indication I can use to help build a minimal working example would be welcome! :slight_smile:

My config

  • Julia 1.4.1, CUDAapi v4.0.0, CuArrays 2.2.0, Flux v0.10.4
  • Nvidia RTX 2070 (8GB)

To replicate

The bug can be replicated as follows.

git clone -b flux-bug git@github.com:jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --color=yes --project scripts/alphazero.jl --game connect-four train

After about a minute, I get the following:

Initializing a new AlphaZero environment

  Initial report
  
    Number of network parameters: 620,552
    Number of regularized network parameters: 617,408
    Memory footprint per MCTS node: 380 bytes
  
  Running benchmark: AlphaZero against MCTS (1000 rollouts)
  
    Progress:  22%|███████████▏                           |  ETA: 0:04:02

CUDNNError: CUDNN_STATUS_INTERNAL_ERROR (code 4)
Stacktrace:
 [1] throw_api_error(::CuArrays.CUDNN.cudnnStatus_t) at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/error.jl:19
 [2] macro expansion at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/error.jl:30 [inlined]
 [3] cudnnCreate(::Base.RefValue{Ptr{Nothing}}) at /home/jonathan/.julia/packages/CUDAapi/XuSHC/src/call.jl:93
 [4] cudnnCreate at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/base.jl:3 [inlined]
 [5] #515 at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/CUDNN.jl:50 [inlined]
 [6] get!(::CuArrays.CUDNN.var"#515#518"{CUDAdrv.CuContext}, ::IdDict{Any,Any}, ::Any) at ./abstractdict.jl:663
 [7] handle() at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/CUDNN.jl:49
 [8] macro expansion at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/utils.jl:36 [inlined]
 [9] cudnnConvolutionForward(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; algo::Int64, alpha::Int64, beta::Int64) at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/conv.jl:72
 [10] conv!(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; alpha::Int64, algo::Int64) at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/nnlib.jl:61
 [11] conv! at /home/jonathan/.julia/packages/CuArrays/l0gXB/src/dnn/nnlib.jl:58 [inlined]
 [12] conv(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/NNlib/FAI3o/src/conv.jl:116
 [13] conv(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}) at /home/jonathan/.julia/packages/NNlib/FAI3o/src/conv.jl:114
 [14] (::Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}})(::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/.julia/packages/Flux/Fj3bt/src/layers/conv.jl:61
 [15] applychain(::Tuple{Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Flux.BatchNorm{typeof(NNlib.relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}}}, ::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36
 [16] (::Flux.Chain{Tuple{Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Flux.BatchNorm{typeof(NNlib.relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}}}})(::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:38
 [17] forward(::ResNet{Game}, ::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/test/AlphaZero.jl/src/networks/flux.jl:184
 [18] evaluate(::ResNet{Game}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,2,Nothing}) at /home/jonathan/test/AlphaZero.jl/src/networks/network.jl:288
 [19] evaluate_batch(::ResNet{Game}, ::Array{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},1}) at /home/jonathan/test/AlphaZero.jl/src/networks/network.jl:313
 [20] macro expansion at ./util.jl:308 [inlined]
 [21] inference_server(::AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}) at /home/jonathan/test/AlphaZero.jl/src/mcts.jl:409
 [22] macro expansion at /home/jonathan/test/AlphaZero.jl/src/util.jl:64 [inlined]
 [23] (::AlphaZero.MCTS.var"#21#23"{AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}})() at ./task.jl:358

I had a similar problem when trying to prefetch a training batch in another thread while calculating gradients. And the error doesn’t show up immediately, only after like hundreds of updates. Really wish someone could investigate it.

This is very interesting. My model also use conv layers.

1 Like

Multi-tasking and threading are fairly recent additions to the Julia/CUDA stack, so some bugs are to be expected. Importantly, every task gets its own CUDNN handle, so you can’t leak handle-local data between tasks. But the backtrace here seems to point to where that handle gets created; I assume that’s the first CUDNN operation in a newly-created task (if not, something’s up with handle creation)? Maybe there’s a limit on how many handles we can create. 200MB free also isn’t much, so maybe creation fails because it runs out of memory and we need to retry after running a GC iteration. You can try that with the following patch:

--- a/lib/cudnn/base.jl
+++ b/lib/cudnn/base.jl
@@ -1,6 +1,9 @@
 function cudnnCreate()
     handle = Ref{cudnnHandle_t}()
-    cudnnCreate(handle)
+    res = @retry_reclaim CUDNN_STATUS_INTERNAL_ERROR unsafe_cudnnCreate(handle)
+    if res != CUDNN_STATUS_SUCCESS
+        throw_api_error(res)
+    end
     return handle[]
 end
1 Like

Any update here?

@maleadt Sorry, I had missed your answer!

So I tried the following patch on top of CuArrays 2.2.1, as you suggested:

diff --git a/src/dnn/base.jl b/src/dnn/base.jl
index 12fcc84..39c0e6b 100644
--- a/src/dnn/base.jl
+++ b/src/dnn/base.jl
@@ -1,6 +1,10 @@
 function cudnnCreate()
     handle = Ref{cudnnHandle_t}()
-    cudnnCreate(handle)
+    println("Call to cudnnCreate")
+    res = @retry_reclaim CUDNN_STATUS_INTERNAL_ERROR unsafe_cudnnCreate(handle)
+    if res != CUDNN_STATUS_SUCCESS
+         throw_api_error(res)
+    end
     return handle[]
 end

However, it does not seem to make any difference:

Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
    Progress:  28%|█████████████████████████████████                                                                                   |  ETA: 0:03:45Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
Call to cudnnCreate
CUDNNError: CUDNN_STATUS_NOT_INITIALIZED (code 1)
Stacktrace:
 [1] throw_api_error(::CuArrays.CUDNN.cudnnStatus_t) at /home/jonathan/.julia/dev/CuArrays/src/dnn/error.jl:19
 [2] cudnnCreate() at /home/jonathan/.julia/dev/CuArrays/src/dnn/base.jl:6
 [3] #514 at /home/jonathan/.julia/dev/CuArrays/src/dnn/CUDNN.jl:50 [inlined]
 [4] get!(::CuArrays.CUDNN.var"#514#517"{CUDAdrv.CuContext}, ::IdDict{Any,Any}, ::Any) at ./abstractdict.jl:663
 [5] handle() at /home/jonathan/.julia/dev/CuArrays/src/dnn/CUDNN.jl:49
 [6] macro expansion at /home/jonathan/.julia/dev/CuArrays/src/utils.jl:36 [inlined]
 [7] cudnnConvolutionForward(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; algo::Int64, alpha::Int64, beta::Int64) at /home/jonathan/.julia/dev/CuArrays/src/dnn/conv.jl:60
 [8] conv!(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; alpha::Int64, algo::Int64) at /home/jonathan/.julia/dev/CuArrays/src/dnn/nnlib.jl:61
 [9] conv! at /home/jonathan/.julia/dev/CuArrays/src/dnn/nnlib.jl:58 [inlined]
 [10] conv(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/jonathan/.julia/packages/NNlib/FAI3o/src/conv.jl:116
 [11] conv(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}) at /home/jonathan/.julia/packages/NNlib/FAI3o/src/conv.jl:114
 [12] (::Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}})(::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/.julia/packages/Flux/Fj3bt/src/layers/conv.jl:61
 [13] applychain(::Tuple{Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Flux.BatchNorm{typeof(NNlib.relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}}}, ::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36
 [14] (::Flux.Chain{Tuple{Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Flux.BatchNorm{typeof(NNlib.relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}}}})(::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:38
 [15] forward(::ResNet{Game}, ::CuArrays.CuArray{Float32,4,Nothing}) at /home/jonathan/AlphaZero.jl/src/networks/flux.jl:184
 [16] evaluate(::ResNet{Game}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,2,Nothing}) at /home/jonathan/AlphaZero.jl/src/networks/network.jl:288
 [17] evaluate_batch(::ResNet{Game}, ::Array{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},1}) at /home/jonathan/AlphaZero.jl/src/networks/network.jl:313
 [18] macro expansion at ./util.jl:308 [inlined]
 [19] inference_server(::AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}) at /home/jonathan/AlphaZero.jl/src/mcts.jl:409
 [20] macro expansion at /home/jonathan/AlphaZero.jl/src/util.jl:64 [inlined]
 [21] (::AlphaZero.MCTS.var"#21#23"{AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}})() at ./task.jl:358

Thanks for your suggestion though! This issue is non-blocking for me (as the Knet backend of AlphaZero.jl still works) so I think I will just keep trying for every new release of CuArrays. I’ll tell you if the problem gets solved. Also, don’t hesitate to tell me if I can help you in any other way. :slight_smile:

Interesting! Please tell me if you find a fix to your similar problem. :slight_smile:
I will also keep you updated.

This is a different error… Which CUDNN are you using? From artifacts, or your own version? If so, is it matched up with your CUDA toolkit and driver?

Oh, I hadn’t even noticed the error was different indeed!

I installed CUDNN by downloading the following package from Nvidia’s website: libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb. It matches my CUDA version (10.2).

To be sure, I reinstalled CUDNN along with my Julia distribution. I am getting the same error: CUDNN_STATUS_NOT_INITIALIZED.

Your CUDA driver and toolkit?

Could you try using artifacts?

I downloaded a single script from Nvida’s website (cuda_10.2.89_440.33.01_linux.run), which installed both CUDA, the CUDA toolkit and also my GPU driver.

Do you have a pointer to an explanation on how to force CuArrays to use artifacts rather than the local CUDA installation?

Yes, do nothing :slightly_smiling_face: Artifacts are used by default, unless your system is unsupported. Verify by running with JULIA_DEBUG=CUDA (assuming you’re using CUDA.jl).

I do not have CUDA.jl as a direct dependency in my project.
I tried to rerun my program after executing “export JULIA_DEBUG=CUDA” but it did not change the output. Did I misunderstand your advice?

Also, in case you are interested, it should be fairly easy to replicate the bug on your own machine:

git clone -b flux-bug git@github.com:jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --color=yes --project scripts/alphazero.jl --game connect-four train

If you do not observe the same bug, it should be a sign that something is wrong with my setup.

If you’re not using CUDA.jl, but CuArrays.jl, you should define JULIA_DEBUG=CuArrays.

Ok, so I am currently using artifacts indeed:

┌ Debug: Trying to use artifacts...
└ @ CuArrays ~/.julia/dev/CuArrays/src/bindeps.jl:84
┌ Debug: Using CUDA 10.2.89 from an artifact at /home/jonathan/.julia/artifacts/93956fcdec9ac5ea76289d25066f02c2f4ebe56e
└ @ CuArrays ~/.julia/dev/CuArrays/src/bindeps.jl:130
┌ Debug: Using CUDNN from an artifact at /home/jonathan/.julia/artifacts/583aee6a50385a6636638b2d170626ad74b44317
└ @ CuArrays ~/.julia/dev/CuArrays/src/bindeps.jl:188
┌ Debug: Using CUTENSOR from an artifact at /home/jonathan/.julia/artifacts/2efa5337c181b4a3883d8dcbd4e1bc3642dbad8b
└ @ CuArrays ~/.julia/dev/CuArrays/src/bindeps.jl:218

I am able to replicate this issue on my Windows config
(the CUDNN_STATUS_NOT_INITIALIZED error, without having applied the patch).

My config:

  • Windows 10
  • Julia 1.4.2
  • Tested with CUDA 10.2 and then with CUDA 11,
    but I think in any case Julia was actually using an artifact for 10.2
  • Nvidia RTX 2080 Max-Q
    (driver version: 451.22, I have also tested with an older version and the same issue happens)
Initializing a new AlphaZero environment

  Initial report

    Number of network parameters: 620,552
    Number of regularized network parameters: 617,408
    Memory footprint per MCTS node: 380 bytes

  Running benchmark: AlphaZero against MCTS (1000 rollouts)

    Progress:  30%|████████████████████████                                                     |  ETA: 0:04:49CUDNNError: CUDNN_STATUS_NOT_INITIALIZED (code 1)
Stacktrace:
 [1] throw_api_error(::CuArrays.CUDNN.cudnnStatus_t) at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\error.jl:19
 [2] macro expansion at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\error.jl:30 [inlined]
 [3] cudnnCreate(::Base.RefValue{Ptr{Nothing}}) at C:\Users\micka\.julia\packages\CUDAapi\XuSHC\src\call.jl:93
 [4] cudnnCreate at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\base.jl:3 [inlined]
 [5] #515 at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\CUDNN.jl:50 [inlined]
 [6] get!(::CuArrays.CUDNN.var"#515#518"{CUDAdrv.CuContext}, ::IdDict{Any,Any}, ::Any) at .\abstractdict.jl:663
 [7] handle() at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\CUDNN.jl:49
 [8] macro expansion at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\utils.jl:36 [inlined]
 [9] cudnnConvolutionForward(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; algo::Int64, alpha::Int64, beta::Int64) at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\conv.jl:72
 [10] conv!(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; alpha::Int64, algo::Int64) at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\nnlib.jl:61
 [11] conv! at C:\Users\micka\.julia\packages\CuArrays\l0gXB\src\dnn\nnlib.jl:58 [inlined]
 [12] conv(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\micka\.julia\packages\NNlib\FAI3o\src\conv.jl:116
 [13] conv(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,Nothing}, ::NNlib.DenseConvDims{2,(3, 3),3,64,(1, 1),(1, 1, 1, 1),(1, 1),false}) at C:\Users\micka\.julia\packages\NNlib\FAI3o\src\conv.jl:114
 [14] (::Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}})(::CuArrays.CuArray{Float32,4,Nothing}) at C:\Users\micka\.julia\packages\Flux\Fj3bt\src\layers\conv.jl:61
 [15] applychain(::Tuple{Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Flux.BatchNorm{typeof(NNlib.relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}}}, ::CuArrays.CuArray{Float32,4,Nothing}) at C:\Users\micka\.julia\packages\Flux\Fj3bt\src\layers\basic.jl:36
 [16] (::Flux.Chain{Tuple{Flux.Conv{2,2,typeof(identity),CuArrays.CuArray{Float32,4,Nothing},CuArrays.CuArray{Float32,1,Nothing}},Flux.BatchNorm{typeof(NNlib.relu),CuArrays.CuArray{Float32,1,Nothing},CuArrays.CuArray{Float32,1,Nothing},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxNets.var"#19#20"}}}})(::CuArrays.CuArray{Float32,4,Nothing}) at C:\Users\micka\.julia\packages\Flux\Fj3bt\src\layers\basic.jl:38
 [17] forward(::ResNet{Game}, ::CuArrays.CuArray{Float32,4,Nothing}) at C:\Users\micka\Desktop\AlphaZero.jl\src\networks\flux.jl:184
 [18] evaluate(::ResNet{Game}, ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,2,Nothing}) at C:\Users\micka\Desktop\AlphaZero.jl\src\networks\network.jl:288
 [19] evaluate_batch(::ResNet{Game}, ::Array{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},1}) at C:\Users\micka\Desktop\AlphaZero.jl\src\networks\network.jl:313
 [20] macro expansion at .\util.jl:308 [inlined]
 [21] inference_server(::AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}) at C:\Users\micka\Desktop\AlphaZero.jl\src\mcts.jl:409
 [22] macro expansion at C:\Users\micka\Desktop\AlphaZero.jl\src\util.jl:64 [inlined]
 [23] (::AlphaZero.MCTS.var"#21#23"{AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}})() at .\task.jl:358

https://github.com/JuliaGPU/CUDA.jl/pull/244 should fix this

1 Like