Failed in test CUDA v3.8.0

I have add CUDA. When I test CUDA, it failed

Testing Running tests…
┌ Info: System information:
│ CUDA toolkit 11.6, artifact installation
│ NVIDIA driver 470.82.1, for CUDA 11.4
│ CUDA driver 11.6

│ Libraries:
│ - CUBLAS: 11.8.1
│ - CURAND: 10.2.9
│ - CUFFT: 10.7.0
│ - CUSOLVER: 11.3.2
│ - CUSPARSE: 11.7.1
│ - CUPTI: 16.0.0
│ - NVML: 11.0.0+470.82.1
│ - CUDNN: 8.30.2 (for CUDA 11.5.0)
│ - CUTENSOR: 1.4.0 (for CUDA 11.5.0)

│ Toolchain:
│ - Julia: 1.7.1
│ - LLVM: 12.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
│ - Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

│ 1 device:
└ 0: Tesla K20c (sm_35, 4.442 GiB / 4.633 GiB available)
ERROR: LoadError: CUDA error: unknown error (code 999, ERROR_UNKNOWN)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/bki2w/lib/cudadrv/error.jl:91
[2] macro expansion
@ ~/.julia/packages/CUDA/bki2w/lib/cudadrv/error.jl:101 [inlined]
[3] cuDevicePrimaryCtxRetain
@ ~/.julia/packages/CUDA/bki2w/lib/utils/call.jl:26 [inlined]
[4] CuContext
@ ~/.julia/packages/CUDA/bki2w/lib/cudadrv/context.jl:55 [inlined]
[5] context(dev::CuDevice)
@ CUDA ~/.julia/packages/CUDA/bki2w/lib/cudadrv/state.jl:222
[6] device!
@ ~/.julia/packages/CUDA/bki2w/lib/cudadrv/state.jl:256 [inlined]
[7] device!
@ ~/.julia/packages/CUDA/bki2w/lib/cudadrv/state.jl:244 [inlined]
[8] top-level scope
@ ~/.julia/packages/CUDA/bki2w/test/runtests.jl:136
[9] include(fname::String)
@ Base.MainInclude ./client.jl:451
[10] top-level scope
@ none:6
in expression starting at /home/gpus/.julia/packages/CUDA/bki2w/test/runtests.jl:127
ERROR: Package CUDA errored during testing

I used MATLAB to test GPU, and it success.
p.s. When I use CUDA 3.5. and GPU also works

This generally means an issue with your driver, and not CUDA.jl. Unless of course you say it didn’t occur with CUDA.jl 3.5, but does with CUDA.jl 3.6 or higher. Can you confirm that, and/or try to narrow down which exact version introduced the failure?

Ok, I test CUDA@3.6.0, it works. Then I rm CUDA and add CUDA@3.7.0, it does not work.
thanks!

Can you bisect even further? If you clone the CUDA.jl repository, you can do a git bisect and run julia --project -e 'using Pkg; Pkg.test()' at every step.

Sorry. I am not very familiar with git and we can not git something from github.com directly. I will try. Thanks, Best wishes!

You start by cloning CUDA.jl from GitHub and working from that directory:

git clone https://github.com/JuliaGPU/CUDA.jl/
cd CUDA.jl
git bisect start
git checkout v3.7.0
git bisect bad
git checkout v3.6.0
git bisect good

(spelling out the commands here for clarity, I know there’s shortcuts)

At that point, git will drop you at specific points in the repository’s history. You then run julia --project -e 'using Pkg; Pkg.test()', and depending on whether that fails or succeeds you execute git bisect bad or git bisect good. After 4 steps, git will tell you which commit is to blame.

HEAD is now at 92f0dce6 Bump version.
gpus@gpus-ESC700-159:~/Downloads/CUDA.jl$ git bisect bad
gpus@gpus-ESC700-159:~/Downloads/CUDA.jl$ git checkout v3.6.0
Previous HEAD position was 92f0dce6 Bump version.
HEAD is now at 2a6bfa6c Merge pull request #1288 from JuliaGPU/tb/shared_isbits
gpus@gpus-ESC700-159:~/Downloads/CUDA.jl$ git bisect good
Bisecting: 20 revisions left to test after this (roughly 4 steps)
[db6170c4d8388f01088b5fe49c6ab3df567a8cfe] Try running CI with the package server enabled.

Now you have to actually test, see the last paragraph of my previous post.

thank you very much!