Hi,
I am optimizing prompt for llama2 model with Transformers.jl
and I occasionally see this error.
CUDNNError: CUDNN_STATUS_NOT_SUPPORTED (code 9)
Stacktrace:
[1] throw_api_error
@ ~/.julia/packages/cuDNN/YkZhm/src/libcudnn.jl:11
[2] check
@ ~/.julia/packages/cuDNN/YkZhm/src/libcudnn.jl:21 [inlined]
[3] cudnnSetTensorNdDescriptorEx
@ ~/.julia/packages/CUDA/tVtYo/lib/utils/call.jl:26
[4] cudnnTensorDescriptor
@ ~/.julia/packages/cuDNN/YkZhm/src/descriptors.jl:40
[5] #cudnnTensorDescriptor#607
@ ~/.julia/packages/cuDNN/YkZhm/src/tensor.jl:9 [inlined]
[6] #cudnnSoftmaxForward!#688
@ ~/.julia/packages/cuDNN/YkZhm/src/softmax.jl:17 [inlined]
[7] cudnnSoftmaxForward!
@ ~/.julia/packages/cuDNN/YkZhm/src/softmax.jl:17 [inlined]
[8] #softmax!#50
@ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:73
[9] softmax!
@ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:70 [inlined]
[10] softmax!
@ ~/.julia/packages/NNlibCUDA/C6t0p/src/cudnn/softmax.jl:70
[11] #_collapseddims#15
@ ~/.julia/packages/NeuralAttentionlib/3zeYG/src/matmul/collapseddims.jl:141
[12] _collapseddims
@ ~/.julia/packages/NeuralAttentionlib/3zeYG/src/matmul/collapseddims.jl:138 [inline
...
The stacktrace is not complete not to clutter, but I think it covers the important part. But I do not know, what to think about it. Could be due to being close to the memory limit of the GPU?