A newbie can not use CUDA by follow official DOC!

I am more than a bit confused here.
The output of nvidia-smi looks OK to me, and a driver version is printed.
I though the nvidia driver had to be loaded to allow that command to work?
I definitely stand to be corrected.

nvidia-smi uses libnvml, and depending on the linux distribution it’s possible that is installed without libcuda being available. Or, on a cluster (where there can be some forwards compatibility of multiple libcudas given a single NVIDIA driver), it might be required to module load cuda.

But you’re right, it’s more likely that libcuda.so is installed but not discoverable. In that case, the distribution package (or the user, if the NVIDIA driver is installed manually) is responsible for adding a ld.so.conf entry to the libcuda path if it isn’t part of the default library search path (or set LD_LIBRARY_PATH if that’s not possible).

Thankyou! That is helpful.

The driver is installed (I mean linux recognize it, so I am assuming it is fine). I followed this post CuArray can't find libcuda - #7 by cuchxq
to make the library discoverable, but it still fails.

When I write

using Libdl
Libdl.dlopen("/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so")

it outputs:

Ptr{Nothing} @0x0000000002b63b40

But what you’re loading here is an unusable version of the driver. stubs/libcuda.so doesn’t contain any functionality. You need the actual libcuda.so.

where can I find it? I use

locate libcuda.so

and I get a bunch of output. Is this the way to find it?

That’s one way, yes. You can paste it here, or in a gist if it’s too long. It would be weird if you have plenty of libcuda.so files though.

This is the output which is long and confusing

/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/local/cuda-10.2/doc/man/man7/libcuda.so.7
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/share/man/man7/libcuda.so.7.gz
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-390-129/x86_64/1.4/a392a94e710ba4116ced1538ca146cc456aa470a623cefd1c6a91f043dc7b210/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-390-129/x86_64/1.4/a392a94e710ba4116ced1538ca146cc456aa470a623cefd1c6a91f043dc7b210/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-390-129/x86_64/1.4/a392a94e710ba4116ced1538ca146cc456aa470a623cefd1c6a91f043dc7b210/files/extra/libcuda.so.390.129
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-435-21/x86_64/1.4/5662dd4a55bd69b95299a79814cf5bd788b6f84b16fe66ec9e903bf380855f9f/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-435-21/x86_64/1.4/5662dd4a55bd69b95299a79814cf5bd788b6f84b16fe66ec9e903bf380855f9f/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-435-21/x86_64/1.4/5662dd4a55bd69b95299a79814cf5bd788b6f84b16fe66ec9e903bf380855f9f/files/extra/libcuda.so.435.21
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-440-33-01/x86_64/1.4/7bd82123228618c583180e638e20a3c7b3ca821dba1eb5ce5e73d16ab4dc0c12/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-440-33-01/x86_64/1.4/7bd82123228618c583180e638e20a3c7b3ca821dba1eb5ce5e73d16ab4dc0c12/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-440-33-01/x86_64/1.4/7bd82123228618c583180e638e20a3c7b3ca821dba1eb5ce5e73d16ab4dc0c12/files/extra/libcuda.so.440.33.01
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL32.nvidia-440-33-01/x86_64/1.4/90dd8243590b0eb3630bc10ae4039a0110bb550d06059d88acda7ff6b53616c3/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL32.nvidia-440-33-01/x86_64/1.4/90dd8243590b0eb3630bc10ae4039a0110bb550d06059d88acda7ff6b53616c3/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL32.nvidia-440-33-01/x86_64/1.4/90dd8243590b0eb3630bc10ae4039a0110bb550d06059d88acda7ff6b53616c3/files/extra/libcuda.so.440.33.01

That’s the one you should be using, and /usr/lib/x86_64-linux-gnu should be on the search path. Did you maybe override that by prioritizing /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/?

When I type

echo $LD_LIBRARY_PATH

I get the following:

:/home/mehrdad/gsl/lib
:/home/mehrdad/.openmpi/lib/
:/home/mehrdad/gsl/lib
:/home/mehrdad/.openmpi/lib/
:/usr/local/cuda/lib64/stubs/
:/usr/lib64/:/usr/local/cuda/lib64/
:/usr/local/cuda-10.2/lib64/
:/usr/local/cuda-10.2/lib64/stubs/

I think there are a bunch of junk in there, but I do not know where to find this file and delete/add stuff in it.

I deleted everything in the LD_LIBRARY_PATH. Then I only added
/usr/lib/x86_64-linux-gnu . The Libdl path for libcuda still returns "/usr/local/cuda/lib64/stubs/libcuda.so". I assume it should return the path I gave to LD_LIBRARY_PATH? How can I change this? (I really appreciate your patience with me!)

Check /etc/ld.so.conf or the files in /etc/ld.so.conf.d. You should not have to add /usr/lib/x86_64-linux-gnu to LD_LIBRARY_PATH, it should already have an entry there. But more importantly, there should be no stubs entry there. LD_LIBRARY_PATH could also be set in /etc/profile or /etc/profile.d/*, or in a variety of places in your home folder (.bashrc, .profile, etc).

I added the path manually in .bashrc. I then restarted and now Libdl finds the library correctly and the examples all work. Thank you sir for your patience. I really appreciate it.

2 Likes

Thanks @Mehrdad_Esfahani and @maleadt for this post. I helped my resolve the exact same problem on my system! I added this line to my .bashrc to remove the stubs:

    export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | tr ":" "\n" | grep -v "stubs" | tr "\n" ":"`

By the way - would it be possible for CUDA.jl to automatically ignore the stub and use the real library instead?