AlphaZero.jl throwing CUDNNError (code 8)

Currently getting an error when trying the Connect Four example project using Julia v1.6.1

May I please get someone’s expertise to suggest where I’m going wrong. AlphaZero.jl was installed via Pkg, not source.

julia> using AlphaZero

julia> experiment = Examples.experiments["connect-four"]
Experiment("connect-four", AlphaZero.Examples.ConnectFour.GameSpec(), Params(SelfPlayParams(MctsParams(1.0, 2.0, 600, PLSchedule{Float64}([0, 20, 30], [1.0, 1.0, 0.3]), 0.25, 1.0, 1.0), SimParams(5000, 128, 64, true, true, 2, 0.0, false)), nothing, LearningParams(true, true, LOG_WEIGHT, Adam(0.002f0), 0.0001f0, 1.0f0, 1.0f0, 1024, 1024, 1, 2000, 1), ArenaParams(MctsParams(1.0, 2.0, 600, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0), SimParams(128, 128, 128, true, true, 2, 0.5, true), 0.05), 15, true, true, PLSchedule{Int64}([0, 15], [400000, 1000000])), ResNet, ResNetHP(5, 128, (3, 3), 32, 32, 0.1f0), AlphaZero.Benchmark.Duel[AlphaZero.Benchmark.Duel(AlphaZero.Benchmark.Full(MctsParams(1.0, 2.0, 600, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0)), AlphaZero.Benchmark.MctsRollouts(MctsParams(1.0, 1.0, 1000, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0)), SimParams(256, 256, 256, true, true, 2, 0.5, false)), AlphaZero.Benchmark.Duel(AlphaZero.Benchmark.NetworkOnly(0.5), AlphaZero.Benchmark.MctsRollouts(MctsParams(1.0, 1.0, 1000, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0)), SimParams(256, 256, 256, true, true, 2, 0.5, false))])

julia> session = Session(experiment, dir="sessions/connect-four")

Initializing a new AlphaZero environment

Session{Env{AlphaZero.Examples.ConnectFour.GameSpec, ResNet, NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}}}(Env{AlphaZero.Examples.ConnectFour.GameSpec, ResNet, NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}}(AlphaZero.Examples.ConnectFour.GameSpec(), Params(SelfPlayParams(MctsParams(1.0, 2.0, 600, PLSchedule{Float64}([0, 20, 30], [1.0, 1.0, 0.3]), 0.25, 1.0, 1.0), SimParams(5000, 128, 64, true, true, 2, 0.0, false)), nothing, LearningParams(true, true, LOG_WEIGHT, Adam(0.002f0), 0.0001f0, 1.0f0, 1.0f0, 1024, 1024, 1, 2000, 1), ArenaParams(MctsParams(1.0, 2.0, 600, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0), SimParams(128, 128, 128, true, true, 2, 0.5, true), 0.05), 15, true, true, PLSchedule{Int64}([0, 15], [400000, 1000000])), ResNet(AlphaZero.Examples.ConnectFour.GameSpec(), ResNetHP(5, 128, (3, 3), 32, 32, 0.1f0), Chain(Conv((3, 3), 3 => 128, pad=1), BatchNorm(128, relu), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15)), Chain(Conv((1, 1), 128 => 32), BatchNorm(32, relu), flatten, Dense(1344, 128, relu), Dense(128, 1, tanh)), Chain(Conv((1, 1), 128 => 32), BatchNorm(32, relu), flatten, Dense(1344, 7), softmax)), ResNet(AlphaZero.Examples.ConnectFour.GameSpec(), ResNetHP(5, 128, (3, 3), 32, 32, 0.1f0), Chain(Conv((3, 3), 3 => 128, pad=1), BatchNorm(128, relu), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15), Chain(SkipConnection(Chain(Conv((3, 3), 128 => 128, pad=1), BatchNorm(128, 
relu), Conv((3, 3), 128 => 128, pad=1), BatchNorm(128)), +), #15)), Chain(Conv((1, 1), 128 => 32), BatchNorm(32, relu), flatten, Dense(1344, 128, relu), Dense(128, 1, tanh)), Chain(Conv((1, 1), 128 => 32), BatchNorm(32, relu), flatten, Dense(1344, 7), softmax)), MemoryBuffer{AlphaZero.Examples.ConnectFour.GameSpec, NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}}(AlphaZero.Examples.ConnectFour.GameSpec(), AlphaZero.TrainingSample{NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}}[], 0), 0), "sessions/connect-four", AlphaZero.UserInterface.Log.Logger(Base.TTY(Base.Libc.WindowsRawSocket(0x0000000000000254) open, 0 bytes waiting), IOStream(<file sessions/connect-four\log.txt>), 1, , true, false, false), true, false, AlphaZero.Benchmark.Evaluation[AlphaZero.Benchmark.Duel(AlphaZero.Benchmark.Full(MctsParams(1.0, 2.0, 600, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0)), AlphaZero.Benchmark.MctsRollouts(MctsParams(1.0, 1.0, 1000, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0)), SimParams(256, 256, 256, true, true, 2, 0.5, false)), AlphaZero.Benchmark.Duel(AlphaZero.Benchmark.NetworkOnly(0.5), AlphaZero.Benchmark.MctsRollouts(MctsParams(1.0, 1.0, 1000, ConstSchedule{Float64}(0.2), 0.05, 1.0, 1.0)), SimParams(256, 256, 256, true, true, 2, 0.5, false))], nothing, AlphaZero.UserInterface.SessionReport(AlphaZero.Report.Iteration[], Vector{AlphaZero.Report.Evaluation}[]))

julia> resume!(session)
  Initial report

    Number of network parameters: 1,672,328
    Number of regularized network parameters: 1,667,776
    Memory footprint per MCTS node: 326 bytes
  
  Running benchmark: AlphaZero against MCTS (1000 rollouts)

    Progress:  44%|████████████████████████████████████████████████████████████                                                                           |  ETA: 0:03:24    Progr    Progress:  46%|██████████████████████████████████████████████████████████████                                                                         |  ETA: 0:03:12    Progr    Progress:  67%|███████████████████████████████████████████████████████████████████████████████████████████                                            |  ETA: 0:01:25    Progr    Progress:  74%|████████████████████████████████████████████████████████████████████████████████████████████████████                                   |  ETA: 0:01:03    Progr    Progress:  96%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████    |  ETA: 0:00:07    Progr    Progress:  98%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████  |  ETA: 0:00:03    Progr    Progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:03:16
    
    Average reward: -0.86 (7% won, 0% draw, 93% lost), redundancy: 15.1%

  Running benchmark: Network Only against MCTS (1000 rollouts)

CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN C:\Users\ianth\.julia\packages\CUDA\9T5Sq\lib\cudnn\error.jl:22
  [2] macro expansion
    @ C:\Users\ianth\.julia\packages\CUDA\9T5Sq\lib\cudnn\error.jl:39 [inlined]
  [3] cudnnActivationForward(handle::Ptr{Nothing}, activationDesc::CUDA.CUDNN.cudnnActivationDescriptor, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUDNN C:\Users\ianth\.julia\packages\CUDA\9T5Sq\lib\utils\call.jl:26
  [4] #cudnnActivationForwardAD#657
    @ C:\Users\ianth\.julia\packages\CUDA\9T5Sq\lib\cudnn\activation.jl:48 [inlined]
  [5] #cudnnActivationForwardWithDefaults#656
    @ C:\Users\ianth\.julia\packages\CUDA\9T5Sq\lib\cudnn\activation.jl:42 [inlined]
  [6] #cudnnActivationForward!#653
    @ C:\Users\ianth\.julia\packages\CUDA\9T5Sq\lib\cudnn\activation.jl:22 [inlined]
  [7] #65
    @ C:\Users\ianth\.julia\packages\NNlibCUDA\EENEy\src\cudnn\activations.jl:13 [inlined]
  [8] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(NNlib.relu), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}})
    @ NNlibCUDA C:\Users\ianth\.julia\packages\NNlibCUDA\EENEy\src\cudnn\activations.jl:30
  [9] (::Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, cache::Nothing)
    @ Flux.CUDAint C:\Users\ianth\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:9
 [10] BatchNorm
    @ C:\Users\ianth\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:6 [inlined]
 [11] applychain(fs::Tuple{Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}) (repeats 2 times)
    @ Flux C:\Users\ianth\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:37
 [12] (::Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(NNlib.relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.Conv{2, 2, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Flux.BatchNorm{typeof(identity), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}})(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Flux C:\Users\ianth\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:39
 [13] forward(nn::ResNet, state::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ AlphaZero.FluxLib C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\networks\flux.jl:142
 [14] forward_normalized(nn::ResNet, state::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, actions_mask::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ AlphaZero.Network C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\networks\network.jl:264
 [15] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}})
    @ AlphaZero.Network C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\networks\network.jl:312
 [16] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}}; batch_size::Int64, fill_batches::Bool) 
    @ AlphaZero C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\simulations.jl:32
 [17] (::AlphaZero.var"#36#37"{Int64, Bool, ResNet})(batch::Vector{NamedTuple{(:board, :curplayer), Tuple{StaticArrays.SMatrix{7, 6, UInt8, 42}, UInt8}}})
    @ AlphaZero C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\simulations.jl:54
    @ C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\batchifier.jl:68 [inlined]
 [19] macro expansion
    @ C:\Users\ianth\.julia\packages\AlphaZero\UZd8L\src\util.jl:20 [inlined]
 [20] (::AlphaZero.Batchifier.var"#2#4"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
    @ AlphaZero.Batchifier C:\Users\ianth\.julia\packages\ThreadPools\ROFEh\src\macros.jl:261Interrupted by the user

1

Here’s my current NVIDIA environment details

julia> import Pkg;

julia> Pkg.add("CUDA")
    Updating registry at `C:\Users\ianth\.julia\registries\General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
    Updating `C:\Users\ianth\.julia\environments\v1.6\Project.toml`
  [052768ef] + CUDA v3.4.2
  No Changes to `C:\Users\ianth\.julia\environments\v1.6\Manifest.toml`


julia> CUDA.versioninfo()
CUDA toolkit 11.4.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 462.80.0

Libraries:
- CUBLAS: 11.5.4
- CURAND: 10.2.5
- CUFFT: 10.5.1
- CUSOLVER: 11.2.0
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+462.80
- CUDNN: 8.20.2 (for CUDA 11.4.0)
  Downloaded artifact: CUTENSOR
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: GeForce RTX 2060 (sm_75, 4.961 GiB / 6.000 GiB available)

julia> [CUDA.capability(dev) for dev in CUDA.devices()]
1-element Vector{VersionNumber}:
 v"7.5.0"

I’ve had several people complaining of code 8 CUDNN errors, see here for example. There was even a period of time where I myself (author of AlphaZero.jl) could not run the connect four benchmark on my own machine, although things have been pretty good on my end since CUDA.jl v3.x.

A frequent cause of code 8 errors seems to be Out Of Memory errors in disguise so you should first decrease the size of the neural network and train again. That being said, in your case, given your GPU, you do seem to have enough GPU memory to run the connect four benchmark with default hyperparameters so this surprises me a little. You are also not the first person to report this so there may be something to investigate here on the CUDA.jl side.

[Side question / off topic] The question assumes limited knowledge about GPU programming. I am wondering, is it possible to use AlphaZero with other GPU accelerators? Is it maybe using KernelAbstractions.jl or is the code very specific to CUDA and hard to adjust for a person without deep experience?

I never tested AlphaZero.jl with another GPU accelerator but it should be possible to do this with only a handful of small modifications. Indeed, most of the code in AlphaZero.jl is completely agnostic to the GPU backend you are using. Also, AlphaZero.jl is already compatible with both Flux and Knet.

If you have a good experience using AlphaZero.jl with an alternative GPU backend, please consider submitting a PR.

[…] most of the code in AlphaZero.jl is completely agnostic to the GPU backend […]

Thank you for the information.

If you have a good experience using AlphaZero.jl with an alternative GPU backend, please consider submitting a PR.

Despite the fact that “a good experience” sounds subjective, I will consider it, however, the action might not be immediate. Please be advised that this is also a new topic for me.

@jonathan-laurent Would you consider providing some additional information that might be useful in the process of implementation of relevant adjustments?

I have no personal experience with KernelAbstractions, but AlphaZero.jl exposes a network interface and so it should work out-of-the-box if you provide a new network implementation that relies on KernelAbstractions. Alternatively, if you manage to make Flux work with it, you can probably reuse some networks from the AlphaZero.jl network library with little modifications.

The only direct references to kernels that I saw in AlphaZero.jl (v0.52) code was related to ResNet network thus was the question. I run it mostly on default settings with Flux. At Flux documentation it is written:

“Support for array operations on other hardware backends, like GPUs, is provided by external packages like CUDA. Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.” GPU Support · Flux

I will investigate it further. Should I have any additional questions I will probably create a separate topic. Thank you.

Flux’ GPU support is currently somewhat tied into CUDA.jl, but that could be improved, https://github.com/FluxML/Flux.jl/pull/1566

For the CUDNN error, it’s hard to work on this without a more concrete MWE. You could try running with JULIA_DEBUG=CUDNN to see if any API calls looks off (e.g. if we’re passing unrealistic inputs or parameters, …).

@maleadt

Flux’ GPU support is currently somewhat tied into CUDA.jl, but that could be improved […]

Thanks a lot for this information. I’ll be looking forward for future developments.

Thanks for your feedback. I was able to work around the issue, by using the github instructions in the readme of AlphaZero.jl

Rather than rely upon automated dependency installation, I manually installed CUDA.jl separately, via Pkg.

export GKSwstype=100  # To avoid an occasional GR bug
git clone https://github.com/jonathan-laurent/AlphaZero.jl.git
cd AlphaZero.jl
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project -e 'using AlphaZero; Scripts.train("connect-four")'

Unfortunately, upon running the AlphaZero samples, using connect-four and tictactoe, both terminated early. I’ll investigate these issues separately after further tests, however these appear to be due to insufficient memory.

At least I have been able to move forward.