Need help with CUDA.jl installation

Hi everybody! I had CUDA.jl installed and nicely running past summer, but somehow goofed it up (driver/cuda update?). It is probably not directly related to CUDA.jl. I crawled dmesg for errors, rebooted and nvidia-smi works.

The problem
julia> versioninfo()
Julia Version 1.7.0-DEV.203
Commit b00e9f0bac (2020-12-31 06:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 7 1700X Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, znver1)
Environment:
  JULIA_DEBUG = CUDA

julia> using CUDA
julia> CUDA.version()

┌ Debug: Initializing CUDA driver
└ @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:88
┌ Error: Recursion during initialization of CUDA.jl
└ @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:38
┌ Error: Error during initialization of CUDA.jl
│   exception =
│    CUDA error (code 999, CUDA_ERROR_UNKNOWN)
│    Stacktrace:
│      [1] throw_api_error(res::CUDA.cudaError_enum)
│        @ CUDA ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/error.jl:97
│      [2] __configure__()
│        @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:93
│      [3] macro expansion
│        @ ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:30 [inlined]
│      [4] macro expansion
│        @ ./lock.jl:209 [inlined]
│      [5] _functional(show_reason::Bool)
│        @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:26
│      [6] functional(show_reason::Bool)
│        @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:19
│      [7] libcuda()
│        @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:47
│      [8] macro expansion
│        @ ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/libcuda.jl:23 [inlined]
│      [9] macro expansion
│        @ ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/error.jl:102 [inlined]
│     [10] cuDriverGetVersion
│        @ ~/.julia/packages/CUDA/qSZa3/lib/utils/call.jl:26 [inlined]
│     [11] version()
│        @ CUDA ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/version.jl:10
│     [12] top-level scope
│        @ REPL[2]:1
│     [13] eval(m::Module, e::Any)
│        @ Core ./boot.jl:369
│     [14] eval_user_input(ast::Any, backend::REPL.REPLBackend)
│        @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:139
│     [15] repl_backend_loop(backend::REPL.REPLBackend)
│        @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:200
│     [16] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
│        @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:185
│     [17] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
│        @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:317
│     [18] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│        @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:305
│     [19] (::Base.var"#890#892"{Bool, Bool, Bool})(REPL::Module)
│        @ Base ./client.jl:394
│     [20] #invokelatest#2
│        @ ./essentials.jl:710 [inlined]
│     [21] invokelatest
│        @ ./essentials.jl:708 [inlined]
│     [22] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│        @ Base ./client.jl:379
│     [23] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:309
│     [24] _start()
│        @ Base ./client.jl:492
└ @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:34
ERROR: CUDA.jl did not successfully initialize, and is not usable.
If you did not see any other error message, try again in a new session
with the JULIA_DEBUG environment variable set to 'CUDA'.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] libcuda()
   @ CUDA ~/.julia/packages/CUDA/qSZa3/src/initialization.jl:48
 [3] macro expansion
   @ ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/libcuda.jl:23 [inlined]
 [4] macro expansion
   @ ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/error.jl:102 [inlined]
 [5] cuDriverGetVersion
   @ ~/.julia/packages/CUDA/qSZa3/lib/utils/call.jl:26 [inlined]
 [6] version()
   @ CUDA ~/.julia/packages/CUDA/qSZa3/lib/cudadrv/version.jl:10
 [7] top-level scope
   @ REPL[2]:1

I’m using:
debian stretch
nvidia driver 450.80.02 from backports
system cuda installation 11.1 from backports
nvcc is at: /usr/bin/nvcc
libcuda.so is at: /usr/lib/i386-linux-gnu/nvidia/current/libcuda.so

I assume that CUDA.jl doesn’t download artifacts, because the driver version is unsupported (although cuda 11.1 should be binary-compatible to driver 450) - and it doesn’t find the local installation made with apt. Can you give me a hint what to do?

That’s false. CUDA drivers are backwards compatible, so we will just download an older version of CUDA.

Your issue is that your driver set-up is broken, error 999 gets thrown by the driver when something serious is wrong. CUDA.jl. doesn’t even get to the point of downloading or looking for the CUDA toolkit. So try rebooting, re-installing your driver (making sure your libcuda, which is part of the driver, matches the exact version of the NVIDIA driver), etc.

1 Like

Thank you. My driver installation was messed up indeed. After an update everything works now.

@maleadt Could this be the reason I’m running into trouble? Is CUDA.jl supported with CUDA 11.3 drivers?

Running

] test CUDA 

From my console, I run into:

Precompiling project...
  1 dependency successfully precompiled in 2 seconds (39 already precompiled)
     Testing Running tests...
┌ Debug: Initializing CUDA driver
└ @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:100
┌ Debug: Trying to use artifacts...
└ @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\deps\bindeps.jl:146
┌ Debug: Selecting artifacts based on driver compatibility 11.3.0
└ @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\deps\bindeps.jl:158
┌ Error: Error during initialization of CUDA.jl
│   exception =
│    AssertionError: isfile(__nvdisasm[])
│    Stacktrace:
│      [1] use_artifact_cuda()
│        @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\deps\bindeps.jl:184
│      [2] __init_dependencies__()
│        @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\deps\bindeps.jl:414
│      [3] __runtime_init__()
│        @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:112
│      [4] macro expansion
│        @ C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:31 [inlined]
│      [5] macro expansion
│        @ .\lock.jl:209 [inlined]
│      [6] _functional(show_reason::Bool)
│        @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:27
│      [7] functional(show_reason::Bool)
│        @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:19
│      [8] macro expansion
│        @ C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:51 [inlined]
│      [9] toolkit_release()
│        @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\deps\bindeps.jl:33
│     [10] top-level scope
│        @ C:\Users\phill\.julia\packages\CUDA\k52QH\test\runtests.jl:112
│     [11] include(fname::String)
│        @ Base.MainInclude .\client.jl:444
│     [12] top-level scope
│        @ none:6
│     [13] eval
│        @ .\boot.jl:360 [inlined]
│     [14] exec_options(opts::Base.JLOptions)
│        @ Base .\client.jl:261
│     [15] _start()
│        @ Base .\client.jl:485
└ @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:34
ERROR: LoadError: CUDA.jl did not successfully initialize, and is not usable.
If you did not see any other error message, try again in a new session
with the JULIA_DEBUG environment variable set to 'CUDA'.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] macro expansion
   @ C:\Users\phill\.julia\packages\CUDA\k52QH\src\initialization.jl:52 [inlined]
 [3] toolkit_release()
   @ CUDA C:\Users\phill\.julia\packages\CUDA\k52QH\deps\bindeps.jl:33
 [4] top-level scope
   @ C:\Users\phill\.julia\packages\CUDA\k52QH\test\runtests.jl:112
 [5] include(fname::String)
   @ Base.MainInclude .\client.jl:444
 [6] top-level scope
   @ none:6
in expression starting at C:\Users\phill\.julia\packages\CUDA\k52QH\test\runtests.jl:112
ERROR: Package CUDA errored during testing

That shouldn’t happen; failed assertions like that are always bugs, so please file an issue (it would be useful if you could add a @show around __nvdisasm[] there and list the contents of the folder it reports).

CUDA 11.3 is supported, but not selected by default, it’ll have used an artifact for 11.2 here. The just-released CUDA.jl 3.2 does fully support & default to CUDA 11.3 though.