CUDNNError: CUDNN_STATUS_NOT_SUPPORTED (code 9) with Transformers.jl

Hi,

I am optimizing prompt for llama2 model with Transformers.jl and I occasionally see this error.

CUDNNError: CUDNN_STATUS_NOT_SUPPORTED (code 9)
Stacktrace:
  [1] throw_api_error
    @ ~/.julia/packages/cuDNN/YkZhm/src/libcudnn.jl:11
  [2] check
    @ ~/.julia/packages/cuDNN/YkZhm/src/libcudnn.jl:21 [inlined]
  [3] cudnnSetTensorNdDescriptorEx
    @ ~/.julia/packages/CUDA/tVtYo/lib/utils/call.jl:26
  [4] cudnnTensorDescriptor
    @ ~/.julia/packages/cuDNN/YkZhm/src/descriptors.jl:40
  [5] #cudnnTensorDescriptor#607
    @ ~/.julia/packages/cuDNN/YkZhm/src/tensor.jl:9 [inlined]
  [6] #cudnnSoftmaxForward!#688
    @ ~/.julia/packages/cuDNN/YkZhm/src/softmax.jl:17 [inlined]
  [7] cudnnSoftmaxForward!
    @ ~/.julia/packages/cuDNN/YkZhm/src/softmax.jl:17 [inlined]
  [8] #softmax!#50
    @ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:73
  [9] softmax!
    @ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:70 [inlined]
 [10] softmax!
    @ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:70
 [11] #_collapseddims#15
    @ ~/.julia/packages/NeuralAttentionlib/3zeYG/src/matmul/collapseddims.jl:141
 [12] _collapseddims
    @ ~/.julia/packages/NeuralAttentionlib/3zeYG/src/matmul/collapseddims.jl:138 [inline
...

The stacktrace is not complete not to clutter, but I think it covers the important part. But I do not know, what to think about it. Could be due to being close to the memory limit of the GPU?

Unlikely, that would manifest as a different error. It seems like NNlib is invoking CUDNN using invalid params here. Maybe try running with JULIA_DEBUG=cuDNN, and inspecting the arguments/inputs to the API call that fail. If you cross-reference to the NVIDIA docs of cudnnSetTensorNdDescriptorEx, you might learn what is being set incorrectly here.

Thanks tim, i will try to hunt this down. This is good advice.