I’m trying to use julia in an HPC environment. I load the CUDA driver using slurm’s “module” functionality. I can see where it is in the $PATH variable. But when I go to run “pkg> test CUDA” I get warnings saying “This version of CUDA.jl only supports NVIDIA drivers for CUDA 10.2 or higher (yours is for CUDA 9.2.0)” but nvcc --version is showing version 10.2 is loaded.
I have read that Artifacts are downloaded with Julia so that some apprpriate version of the nvidia drivers are downloaded with cuda.jl. When I run CUDA.versioninfo(), however, I get the following warning (followed by an error):
┌ Warning: Unable to use CUDA from artifacts: Could not find or download a compatible artifact for your platform (x86_64-linux-gnu-libgfortran4-cxx11-julia_version+1.6.7).
I’m not sure how to check whether these packages were obtained. Ideas>
I’m confused about the error message. Is it due to the lack of network access or is it because this platform triple does is not normally one that Julia supports? Pinging @giordano to comment on that.
The toolkit is different from the driver. libcuda is discovered on the library search path, so check ldconfig -p.
No, we only download a toolkit that’s appropriate for your driver. We cannot use our own driver, as that requires administrative permissions (and is tied to the active kernel). The reason it fails is because we do not support the NVIDIA driver for CUDA 9.2, and we don’t provide artifacts for it.
If you can run the nvidia-smi command on the available nodes, it will show the GPU model (which determines what drivers can be used), and the currently installed driver, including its CUDA compatibility.
If your administrator has provided CUDA 10.2 as a module there may be some nodes where it is actually useful, and some way to assign your task to them.
It looks like you are experiencing the same problem than here post.
As explained in the post, you should be able to use the CUDA installation provided in your cluster and not downloading anything extra. To prevent CUDA.jl downloads you should use JULIA_CUDA_USE_BINARYBUILDER=false
It would be nice that CUDA.jl was able to look first for a valid local CUDA installation and only if it fails start the download process. It make sense?