Unable to use CUDA from artifacts

I’m trying to use julia in an HPC environment. I load the CUDA driver using slurm’s “module” functionality. I can see where it is in the $PATH variable. But when I go to run “pkg> test CUDA” I get warnings saying “This version of CUDA.jl only supports NVIDIA drivers for CUDA 10.2 or higher (yours is for CUDA 9.2.0)” but nvcc --version is showing version 10.2 is loaded.

I have read that Artifacts are downloaded with Julia so that some apprpriate version of the nvidia drivers are downloaded with cuda.jl. When I run CUDA.versioninfo(), however, I get the following warning (followed by an error):
┌ Warning: Unable to use CUDA from artifacts: Could not find or download a compatible artifact for your platform (x86_64-linux-gnu-libgfortran4-cxx11-julia_version+1.6.7).

I’m not sure how to check whether these packages were obtained. Ideas>

@maleadt may be able to best answer this question about CUDA.

Frankly, from the original title it is not obvious what the issue is, but reading the content makes it clear you are specifically interested in CUDA.

The artifacts in question originate from GitHub - JuliaBinaryWrappers/CUDA_jll.jl

I’m confused about the error message. Is it due to the lack of network access or is it because this platform triple does is not normally one that Julia supports? Pinging @giordano to comment on that.

I would also recommend looking at this documentation:
https://cuda.juliagpu.org/stable/installation/overview/

As described in the above link try setting the environment variable JULIA_DEBUG to “CUDA” and share the results.

Also, could you please share the full error message that follows the warning?

If you know any additional information about how Julia was installed on your system that would also be helpful.

xref: Path to CUDA driver

The toolkit is different from the driver. libcuda is discovered on the library search path, so check ldconfig -p.

No, we only download a toolkit that’s appropriate for your driver. We cannot use our own driver, as that requires administrative permissions (and is tied to the active kernel). The reason it fails is because we do not support the NVIDIA driver for CUDA 9.2, and we don’t provide artifacts for it.

1 Like

If you can run the nvidia-smi command on the available nodes, it will show the GPU model (which determines what drivers can be used), and the currently installed driver, including its CUDA compatibility.
If your administrator has provided CUDA 10.2 as a module there may be some nodes where it is actually useful, and some way to assign your task to them.

1 Like

It looks like you are experiencing the same problem than here post.

As explained in the post, you should be able to use the CUDA installation provided in your cluster and not downloading anything extra. To prevent CUDA.jl downloads you should use JULIA_CUDA_USE_BINARYBUILDER=false

It would be nice that CUDA.jl was able to look first for a valid local CUDA installation and only if it fails start the download process. It make sense?