CUDNNError: CUDNN_STATUS_NOT_SUPPORTED (code 9) with Transformers.jl

Hi,

I am optimizing prompt for llama2 model with Transformers.jl and I occasionally see this error.

CUDNNError: CUDNN_STATUS_NOT_SUPPORTED (code 9)
Stacktrace:
  [1] throw_api_error
    @ ~/.julia/packages/cuDNN/YkZhm/src/libcudnn.jl:11
  [2] check
    @ ~/.julia/packages/cuDNN/YkZhm/src/libcudnn.jl:21 [inlined]
  [3] cudnnSetTensorNdDescriptorEx
    @ ~/.julia/packages/CUDA/tVtYo/lib/utils/call.jl:26
  [4] cudnnTensorDescriptor
    @ ~/.julia/packages/cuDNN/YkZhm/src/descriptors.jl:40
  [5] #cudnnTensorDescriptor#607
    @ ~/.julia/packages/cuDNN/YkZhm/src/tensor.jl:9 [inlined]
  [6] #cudnnSoftmaxForward!#688
    @ ~/.julia/packages/cuDNN/YkZhm/src/softmax.jl:17 [inlined]
  [7] cudnnSoftmaxForward!
    @ ~/.julia/packages/cuDNN/YkZhm/src/softmax.jl:17 [inlined]
  [8] #softmax!#50
    @ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:73
  [9] softmax!
    @ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:70 [inlined]
 [10] softmax!
    @ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:70
 [11] #_collapseddims#15
    @ ~/.julia/packages/NeuralAttentionlib/3zeYG/src/matmul/collapseddims.jl:141
 [12] _collapseddims
    @ ~/.julia/packages/NeuralAttentionlib/3zeYG/src/matmul/collapseddims.jl:138 [inline
...

The stacktrace is not complete not to clutter, but I think it covers the important part. But I do not know, what to think about it. Could be due to being close to the memory limit of the GPU?

Unlikely, that would manifest as a different error. It seems like NNlib is invoking CUDNN using invalid params here. Maybe try running with JULIA_DEBUG=cuDNN, and inspecting the arguments/inputs to the API call that fail. If you cross-reference to the NVIDIA docs of cudnnSetTensorNdDescriptorEx, you might learn what is being set incorrectly here.

Thanks tim, i will try to hunt this down. This is good advice.

Did you find the cause of this? I’m asking because I’m getting the exact same error. Unlike the above case, my code does not involve Transformers.jl, but just as above the error only occurs when operating close to the memory limit of the GPU.

Hi Per,

I think it was on the end some basic problem, but I do not remember which one. Do you use more then one GPU? One of the things I have been playing with was to spread the model across multiple GPUs, which might be the case. The second thing I have realized was that to take gradient with respect to llama2, I had to use GPU with 80gb of memory.

Tomas

Hi Tomas, No, I only use one GPU, with 24 GB of RAM. (I’ve not yet tried to make a MWE.)