Oh, so it’s actually cuInit
that fails. Try the following then:
julia> using CUDAdrv
julia> CUDAdrv.CUDAapi.@runtime_ccall((:cuInit, CUDAdrv.__libcuda), CUDAdrv.CUresult, (UInt32,), 0)
CUDA_SUCCESS::cudaError_enum = 0x00000000
julia> Int(ans)
0
Oh, so it’s actually cuInit
that fails. Try the following then:
julia> using CUDAdrv
julia> CUDAdrv.CUDAapi.@runtime_ccall((:cuInit, CUDAdrv.__libcuda), CUDAdrv.CUresult, (UInt32,), 0)
CUDA_SUCCESS::cudaError_enum = 0x00000000
julia> Int(ans)
0
This is the error I get
UnknownMember::cudaError_enum = 0xffffffff
This return code of -1 indicates you are using the CUDA stub libraries. You probably have not installed the NVIDIA driver, or don’t have it at a discoverable path. Try:
julia> using Libdl
julia> Libdl.dlpath("libcuda")
"/usr/bin/../lib/libcuda.so"
Also, could you try the failing example again but launching julia with --startup-file=no
to see if the failure to render that exception is due to some code you have loaded, and show the output of using Logging; show(global_logger())
when you run your script? Be sure to show exactly the code you are running, in which environment (e.g. a REPL, Juno, or whatnot); I’d like to know what’s causing the output issue
When I run
using Libdl
Libdl.dlpath("libcuda")
I get
"/usr/local/cuda/lib64/stubs/libcuda.so"
I start Julia by typing julia --startup-file=no
. Then I execute using IJulia;jupyter-notebook()
.
I execute the example with additional commands. It still fails and produce the previous error. The new commands generate new info:
Base.CoreLogging.SimpleLogger(IJulia.IJuliaStdio{Base.PipeEndpoint}(IOContext(Base.PipeEndpoint(RawFD(0x0000002d) open, 0 bytes waiting))), Info, Dict{Any,Int64}())
Aha, so IJulia messes up the logging. Good to know.
Anyway, the dlpath
output confirms my suspicion. You need to install the NVIDIA driver, and if it is, make sure its libcuda
is discoverable.
I am more than a bit confused here.
The output of nvidia-smi looks OK to me, and a driver version is printed.
I though the nvidia driver had to be loaded to allow that command to work?
I definitely stand to be corrected.
nvidia-smi
uses libnvml, and depending on the linux distribution it’s possible that is installed without libcuda being available. Or, on a cluster (where there can be some forwards compatibility of multiple libcuda
s given a single NVIDIA driver), it might be required to module load cuda
.
But you’re right, it’s more likely that libcuda.so
is installed but not discoverable. In that case, the distribution package (or the user, if the NVIDIA driver is installed manually) is responsible for adding a ld.so.conf
entry to the libcuda path if it isn’t part of the default library search path (or set LD_LIBRARY_PATH if that’s not possible).
Thankyou! That is helpful.
The driver is installed (I mean linux recognize it, so I am assuming it is fine). I followed this post CuArray can't find libcuda - #7 by cuchxq
to make the library discoverable, but it still fails.
When I write
using Libdl
Libdl.dlopen("/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so")
it outputs:
Ptr{Nothing} @0x0000000002b63b40
But what you’re loading here is an unusable version of the driver. stubs/libcuda.so
doesn’t contain any functionality. You need the actual libcuda.so
.
where can I find it? I use
locate libcuda.so
and I get a bunch of output. Is this the way to find it?
That’s one way, yes. You can paste it here, or in a gist if it’s too long. It would be weird if you have plenty of libcuda.so
files though.
This is the output which is long and confusing
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/local/cuda-10.2/doc/man/man7/libcuda.so.7
/usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/share/man/man7/libcuda.so.7.gz
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-390-129/x86_64/1.4/a392a94e710ba4116ced1538ca146cc456aa470a623cefd1c6a91f043dc7b210/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-390-129/x86_64/1.4/a392a94e710ba4116ced1538ca146cc456aa470a623cefd1c6a91f043dc7b210/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-390-129/x86_64/1.4/a392a94e710ba4116ced1538ca146cc456aa470a623cefd1c6a91f043dc7b210/files/extra/libcuda.so.390.129
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-435-21/x86_64/1.4/5662dd4a55bd69b95299a79814cf5bd788b6f84b16fe66ec9e903bf380855f9f/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-435-21/x86_64/1.4/5662dd4a55bd69b95299a79814cf5bd788b6f84b16fe66ec9e903bf380855f9f/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-435-21/x86_64/1.4/5662dd4a55bd69b95299a79814cf5bd788b6f84b16fe66ec9e903bf380855f9f/files/extra/libcuda.so.435.21
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-440-33-01/x86_64/1.4/7bd82123228618c583180e638e20a3c7b3ca821dba1eb5ce5e73d16ab4dc0c12/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-440-33-01/x86_64/1.4/7bd82123228618c583180e638e20a3c7b3ca821dba1eb5ce5e73d16ab4dc0c12/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL.nvidia-440-33-01/x86_64/1.4/7bd82123228618c583180e638e20a3c7b3ca821dba1eb5ce5e73d16ab4dc0c12/files/extra/libcuda.so.440.33.01
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL32.nvidia-440-33-01/x86_64/1.4/90dd8243590b0eb3630bc10ae4039a0110bb550d06059d88acda7ff6b53616c3/files/extra/libcuda.so
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL32.nvidia-440-33-01/x86_64/1.4/90dd8243590b0eb3630bc10ae4039a0110bb550d06059d88acda7ff6b53616c3/files/extra/libcuda.so.1
/var/lib/flatpak/runtime/org.freedesktop.Platform.GL32.nvidia-440-33-01/x86_64/1.4/90dd8243590b0eb3630bc10ae4039a0110bb550d06059d88acda7ff6b53616c3/files/extra/libcuda.so.440.33.01
That’s the one you should be using, and /usr/lib/x86_64-linux-gnu
should be on the search path. Did you maybe override that by prioritizing /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/
?
When I type
echo $LD_LIBRARY_PATH
I get the following:
:/home/mehrdad/gsl/lib
:/home/mehrdad/.openmpi/lib/
:/home/mehrdad/gsl/lib
:/home/mehrdad/.openmpi/lib/
:/usr/local/cuda/lib64/stubs/
:/usr/lib64/:/usr/local/cuda/lib64/
:/usr/local/cuda-10.2/lib64/
:/usr/local/cuda-10.2/lib64/stubs/
I think there are a bunch of junk in there, but I do not know where to find this file and delete/add stuff in it.
I deleted everything in the LD_LIBRARY_PATH
. Then I only added
/usr/lib/x86_64-linux-gnu
. The Libdl
path for libcuda
still returns "/usr/local/cuda/lib64/stubs/libcuda.so"
. I assume it should return the path I gave to LD_LIBRARY_PATH
? How can I change this? (I really appreciate your patience with me!)
Check /etc/ld.so.conf
or the files in /etc/ld.so.conf.d
. You should not have to add /usr/lib/x86_64-linux-gnu
to LD_LIBRARY_PATH, it should already have an entry there. But more importantly, there should be no stubs
entry there. LD_LIBRARY_PATH could also be set in /etc/profile
or /etc/profile.d/*
, or in a variety of places in your home folder (.bashrc, .profile, etc).
I added the path manually in .bashrc
. I then restarted and now Libdl
finds the library correctly and the examples all work. Thank you sir for your patience. I really appreciate it.
Thanks @Mehrdad_Esfahani and @maleadt for this post. I helped my resolve the exact same problem on my system! I added this line to my .bashrc
to remove the stubs:
export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | tr ":" "\n" | grep -v "stubs" | tr "\n" ":"`
By the way - would it be possible for CUDA.jl
to automatically ignore the stub and use the real library instead?