Failed to load `CUDAdrv`

affans · December 20, 2019, 6:51pm

I am, for the first time, trying to learn how GPU programming works. I have two Tesla K80s on a HPC. To begin I request the proper node through slurm, and launch an interactive job.

[affans@hpc ~]$ srun -p gpuq --gres=gpu:2 --pty bash
[affans@node018 ~]$ nvidia-smi
Fri Dec 20 13:46:43 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:06:00.0 Off |                    0 |
| N/A   30C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 00000000:07:00.0 Off |                    0 |
| N/A   24C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Then I launch Julia in this bash and try to compile CUDAdrv. (I have two installations… one local to my home directory v1.2 and one installed at the system level 1.0.3 that no one uses).

[affans@node018 ~]$ export JULIA_CUDA_VERBOSE=true
[affans@node018 ~]$ ./bin/julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.2.0 (2019-08-20)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CUDAdrv
┌ Error: CUDAdrv.jl failed to initialize
│   exception =
│    CUDA error: unknown error (code 999, ERROR_UNKNOWN)
│    Stacktrace:
│     [1] throw_api_error(::CUDAdrv.cudaError_enum) at /home/affans/.julia/packages/CUDAdrv/i465Q/src/error.jl:131      │     [2] macro expansion at /home/affans/.julia/packages/CUDAdrv/i465Q/src/error.jl:144 [inlined]
│     [3] cuInit at /home/affans/.julia/packages/CUDAdrv/i465Q/src/libcuda.jl:18 [inlined]
│     [4] __init__() at /home/affans/.julia/packages/CUDAdrv/i465Q/src/CUDAdrv.jl:56
│     [5] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:685
│     [6] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:765
│     [7] _require(::Base.PkgId) at ./loading.jl:990
│     [8] require(::Base.PkgId) at ./loading.jl:911
│     [9] require(::Module, ::Symbol) at ./loading.jl:906
│     [10] eval(::Module, ::Any) at ./boot.jl:330
│     [11] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:86
│     [12] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:118 [inlined]
│     [13] (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})() at ./task.jl:268
└ @ CUDAdrv ~/.julia/packages/CUDAdrv/i465Q/src/CUDAdrv.jl:67

The error message is very cryptic. No idea how to even debug this.

I think I should use a basic C script first to test out the GPUs and making sure the libraries work before using Julia. Is there a basic C script that simply prints out the name of the devices?

maleadt · December 20, 2019, 10:12pm

We can’t provide you with much more information though, that’s just what the NVIDIA driver reports. Please make sure CUDA C-compiled binaries work (the CUDA toolkit comes with samples). This error happens when initializing CUDA, and is one of the first things CUDAdrv does. Typically it is not Julia/CUDAdrv related.

affans · December 25, 2019, 9:33pm

That’s what I figured. What would be the simplest C script to run to see if everything is working as intended? “simple C GPU script” dosn’t really help on Google.

KajWiik · December 26, 2019, 1:02am

See e.g.

Here’s how I compiled and tested Julia CUDA libraries in our cluster:

mkdir bin; cd bin
curl -o julia-1.3.0-linux-x86_64.tar.gz https://julialang-s3.julialang.org/bin/linux/x64/1.3/julia-1.3.0-linux-x86_64.tar.gz
tar xzvf julia-1.3.0-linux-x86_64.tar.gz
module add CUDA
ln -s julia-1.3.0 julia
julia/bin/julia
]
pkg> add CUDAapi CUDAdrv CUDAnative CuArrays BenchmarkTools DiffResults ForwardDiff Compat

cat > buildcuarrays.jl
using Pkg
Pkg.test("CuArrays")

srun -p gpu --mem=50G --time=5:00:00 bash -c 'pwd;echo $SLURMD_NODENAME;~/bin/julia-1.3.0/bin/julia ~/bin/testcuarrays.jl'

Topic		Replies	Views
Error installing CUDAdrv GPU	2	891	April 20, 2020
Failure to run CUDA.jl GPU	3	1532	January 20, 2021
A newbie can not use CUDA by follow official DOC! GPU	33	6651	October 16, 2020
Starting to program for Cuda do I have to load the drivers GPU	3	375	November 1, 2021
LoadError: Could not find any suitable device for this configuration GPU question , windows , cuda , error	2	342	March 20, 2023

Failed to load `CUDAdrv`

Related topics