Juliacon21-gpu_workshop notebook `could not load library "libcuda.so.1"`

I am trying to get started on the JuliaCon GPU tutorial. I have tried everything I can think of

I’m on Ubuntu 20.04.2 LTS.

If I use CUDA from the repl, it is successful.
If I try it from a jupyter notebook, launched from the same repl that successfully uses CUDA, I get
could not load library "libcuda.so.1"

The only instance of that filename is:

% locate libcuda.so.1          
/usr/lib/x86_64-linux-gnu/libcuda.so.1

Which is not in the nvidia install of CUDA. It is from

% dpkg -S /usr/lib/x86_64-linux-gnu/libcuda.so.1                                              
libnvidia-compute-470:amd64: /usr/lib/x86_64-linux-gnu/libcuda.so.1

which is maintained by ubuntu core developers. I have reported a bug there. But I don’t really know why this doesn’t work. There goes another day…

The version of CUDA being used should be managed here →

sudo update-alternatives --display cuda
cuda - auto mode
  link best version is /usr/local/cuda-11.4
  link currently points to /usr/local/cuda-11.4
  link cuda is /usr/local/cuda
/usr/local/cuda-11.4 - priority 114

My system is otherwise well-behaved AFAIK

% nvidia-smi
...
NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4 
...
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0

The Julia repl is running CUDA fine:

julia> using CUDA

julia> CUDA.versioninfo()
CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.4.0
NVIDIA driver 470.57.2

Libraries: 
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+470.57.2
- CUDNN: 8.20.0 (for CUDA 11.3.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce GTX 1050 Ti (sm_61, 3.446 GiB / 3.938 GiB available)
% julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.1 (2021-04-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.6) pkg> add CUDA, IJulia
    Updating registry at `~/.julia/registries/General`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`

shell> ls
common_definitions.jl  FunWithArrays.ipynb	  Introduction_CUDA.ipynb  Lilly_hat.jpg  sneak_peek
DeviceSideRNG.ipynb    ImageProcessing.ipynb	  JuliaSet.ipynb	   Manifest.toml  src
Diffusion.ipynb        Introduction_AMDGPU.ipynb  kernelabstractions	   Project.toml

julia> using IJulia

julia> jupyterlab()

I get this error:

I am guessing it is both of the Jupyter developers, and Ubuntu developers struggling with “update-alternatives”. The Julia repl seems to get the right result.

The scenario works in Pluto, but not in Jupyter

libcuda.so is expected to be provided by the NVIDIA driver, aka libnvidia-*. update-alternatives cuda has nothing to do with this. And if the library is correctly detected from a command line, there was probably no need to file a bug with Ubuntu developers.

The only question is why libcuda.so in /usr/lib/x86_64-linux-gnu isn’t discovered in Jupyter. That path should have been registered by an entry in /etc/ld.so.conf.d. Are you sure you’re running your notebook on the same server? It’s also possible that your system is badly configured, e.g. without an entry for /usr/lib/x86_64-linux-gnu in /etc/ld.so.conf.d, and that you used a LD_LIBRARY_PATH entry to discover libcuda.so – that environment variable may have gotten lost by the Jupyter environment.

Just got it working in an LXC container with CUDA pass-through.
All the CUDA libraries seem to be behaving themselves with some mixed-parentage…

% locate 'libcuda.so'
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.470.57.02
/usr/local/cuda-11.4/targets/x86_64-linux/lib/stubs/libcuda.sosudo update

% sudo update-alternatives --list cuda
/usr/local/cuda-11.4

% locate nvcc        
/usr/local/cuda-11.4/bin/nvcc
/usr/local/cuda-11.4/bin/nvcc.profile

NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4

I should probably update paths or alternatives, but I’m over my head here, and it works…

something like the following ubuntu 20.04.2 commands:

sudo apt install pip aptitude
pip install jill --user -U

added Ubuntu PPA for graphic-drivers and installed the recommended driver.

added the repository for NVIDIA drivers, but did not install.

% sudo apt update
% sudo apt install build-essential
% sudo apt-get install linux-headers-$(uname -r) # nvidia prep advice from nvidia

Then I searched for cuda-toolkit, and chose the version that matched the CUDA version installed in the host - visible within the container with nvidia-smi

% sudo aptitude search cuda-toolkit
[...]                                         - CUDA Toolkit 11.2 meta-package                                        
p   cuda-toolkit-11-3                                           - CUDA Toolkit 11.3 meta-package                                        
p   cuda-toolkit-11-3-config-common                             - Common config package for CUDA Toolkit 11.3.                          
i   cuda-toolkit-11-4                                           - CUDA Toolkit 11.4 meta-package                                        
[...]   

The 1.4 meta-package matches.

% sudo aptitude install cuda-toolkit

packages were installed from all the added repositories, CUDA was available in both Pluto and Jupyter.

I managed to find the conda package installed by Julia, and activated to let me run

% jupyter notebook list               
Currently running servers:
http://localhost:8888/?token=<blah-blah> :: /home/ubuntu

I had sshd into the container with port forwarding to allow access to jupyter

ssh -L localhost:8888:localhost:8888 $c1.lxd

Note:
CUDA is installed in the base Ubuntu operating system, and is passed through to the container - no install required.

You don’t need the CUDA toolkit, CUDA.jl won’t use it and downloads its own copy. You don’t need the graphics-drivers PPA. Just add the NVIDIA one, and install the cuda-drivers metapackage, that’s all you need to do.

No, the NVIDIA driver is passed through (confusingly providing libcuda.so), CUDA itself isn’t. But you don’t need CUDA.

Thanks very much for your reply.
When you say ‘the NVIDIA one’, it isn’t clear to me.
What you mean - either driver or it’s provenance?
Ubuntu 20.04 has nvidia drivers, alternatives are also available from nvidia.

I had no trouble I was aware of directly in the repl. It only worked in the Jupiter notebook installed by the repl with the the packages from Ubuntu-drivers. Success at that stage may have been dumb luck.

There are so many moving parts.

Is there somewhere I can find a specification of the specific prerequisite packages?

Cheers…

It’s always the NVIDIA driver you need, there is no alternative (e.g. nouveau doesn’t work). So I meant the NVIDIA apt repository, because it typically provides more up-to-date drivers.

There aren’t that many, the only thing you need is the NVIDIA driver module and the corresponding CUDA driver library (libcuda.so). Typically, those are installed together using any of the driver installers from the NVIDIA home page. But since you’re using Ubuntu, I recommend using the NVIDIA apt repository and just installing cuda-drivers. You can use the official Ubuntu one as well, it works equally well, but might be a little older.

1 Like

I very much appreciate how Julia tries to encapsulate and configure its own dependencies.

1 Like

If only we could distribute libcuda.so too :slightly_smiling_face: It’s actually possible, but only on datacenter hardware, so I haven’t bothered to implement that feature yet. And you’d still need to install a driver, it would just simplify upgrading.

Hi again,

I have spent two days re-configuring system to try and get this working.

It think this issue is related: libcuda.so.1 is missing with conda env tf-gpu #11743

I believe my evidence demonstrates the jupyter notebook using julia package IJulia is not using the same method to find libcuda as the repl.

in the repl:

j% julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Libdl

julia> Libdl.find_library("libcuda")
"libcuda"

In the notebook with a freshly reset kernel:

image

With jupyterlab installed locally

% conda create -n jupyter-julia python=3.9
% conda activate jupyter-julia
% mamba install -c conda-forge jupyterlab 

Check julia in this environment

(jupyter-julia) 
% julia --version
julia version 1.6.2
julia> using Libdl
julia> Libdl.find_library("libcuda")
julia> using Libdl

julia> Libdl.find_library("libcuda")
"libcuda"

julia> using CUDA

julia> CUDA.versioninfo()
CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.4.0
NVIDIA driver 470.57.2

Libraries: 
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+470.57.2
- CUDNN: 8.20.0 (for CUDA 11.3.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.2
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce GTX 1050 Ti (sm_61, 3.337 GiB / 3.938 GiB available)

From the newly installed Jupyter system with a 1.6 kernel,

using Libdl
Libdl.find_library("libcuda")
""

I have reported a bug against IJulia
#1015