Hello. Iβm trying to install CUDA.jl inside an interactive session on a cluster node with an NVIDIA Tesla P100 GPU. The output of nvidia-smi
is
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:03:00.0 Off | 0 |
| N/A 27C P0 28W / 250W | 290MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Running versioninfo()
in Julia returns
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 Γ Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)
And here is the output when attempting to install CUDA.jl
(v1.10) pkg> add CUDA
Installing known registries into `~/.julia`
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Installed GPUArraysCore βββββββββββββββ v0.1.6
Installed Scratch βββββββββββββββββββββ v1.2.1
Installed Crayons βββββββββββββββββββββ v4.1.1
Installed Adapt βββββββββββββββββββββββ v4.0.4
Installed ColorTypes ββββββββββββββββββ v0.11.5
Installed TableTraits βββββββββββββββββ v1.0.1
Installed Preferences βββββββββββββββββ v1.4.3
Installed CUDA_Driver_jll βββββββββββββ v0.8.1+0
Installed LLVMLoopInfo ββββββββββββββββ v1.0.0
Installed GPUCompiler βββββββββββββββββ v0.26.4
Installed DataAPI βββββββββββββββββββββ v1.16.0
Installed SentinelArrays ββββββββββββββ v1.4.1
Installed Parsers βββββββββββββββββββββ v2.8.1
Installed Tables ββββββββββββββββββββββ v1.11.1
Installed FixedPointNumbers βββββββββββ v0.8.4
Installed PrettyTables ββββββββββββββββ v2.3.1
Installed PooledArrays ββββββββββββββββ v1.4.3
Installed TimerOutputs ββββββββββββββββ v0.5.23
Installed JLLWrappers βββββββββββββββββ v1.5.0
Installed InlineStrings βββββββββββββββ v1.4.0
Installed StaticArraysCore ββββββββββββ v1.4.2
Installed AbstractFFTs ββββββββββββββββ v1.5.0
Installed IteratorInterfaceExtensions β v1.0.0
Installed StaticArrays ββββββββββββββββ v1.9.3
Installed PrecompileTools βββββββββββββ v1.2.1
Installed CUDA_Runtime_Discovery ββββββ v0.2.4
Installed DataValueInterfaces βββββββββ v1.0.0
Installed LLVMExtra_jll βββββββββββββββ v0.0.29+0
Installed NVTX ββββββββββββββββββββββββ v0.3.4
Installed NVTX_jll ββββββββββββββββββββ v3.1.0+2
Installed OrderedCollections ββββββββββ v1.6.3
Installed LaTeXStrings ββββββββββββββββ v1.3.1
Installed CEnum βββββββββββββββββββββββ v0.5.0
Installed UnsafeAtomicsLLVM βββββββββββ v0.1.3
Installed InvertedIndices βββββββββββββ v1.3.0
Installed JuliaNVTXCallbacks_jll ββββββ v0.2.1+0
Installed CUDA_Runtime_jll ββββββββββββ v0.12.1+0
Installed Reexport ββββββββββββββββββββ v1.2.2
Installed BFloat16s βββββββββββββββββββ v0.5.0
Installed GPUArrays βββββββββββββββββββ v10.1.0
Installed Random123 βββββββββββββββββββ v1.7.0
Installed RandomNumbers βββββββββββββββ v1.5.3
Installed DataFrames ββββββββββββββββββ v1.6.1
Installed Requires ββββββββββββββββββββ v1.3.0
Installed Compat ββββββββββββββββββββββ v4.14.0
Installed ExprTools βββββββββββββββββββ v0.1.10
Installed DataStructures ββββββββββββββ v0.18.20
Installed MacroTools ββββββββββββββββββ v0.5.13
Installed KernelAbstractions ββββββββββ v0.9.18
Installed Colors ββββββββββββββββββββββ v0.12.10
Installed UnsafeAtomics βββββββββββββββ v0.2.1
Installed Missings ββββββββββββββββββββ v1.2.0
Installed StringManipulation ββββββββββ v0.3.4
Installed SortingAlgorithms βββββββββββ v1.2.1
Installed Atomix ββββββββββββββββββββββ v0.1.0
Installed LLVM ββββββββββββββββββββββββ v6.6.3
Installed CUDA ββββββββββββββββββββββββ v5.3.1
Downloaded artifact: LLVMExtra
Downloaded artifact: NVTX
Downloaded artifact: JuliaNVTXCallbacks
Updating `/.../.julia/environments/v1.10/Project.toml`
[052768ef] + CUDA v5.3.1
Updating `/.../.julia/environments/v1.10/Manifest.toml`
Downloaded artifact: CUDA_Driver
[pid 47305] waiting for IO to finish:
Handle type uv_handle_t->data
timer 0x1cd5200->0x2b840063b8e0
This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.
[pid 47305] waiting for IO to finish:
Handle type uv_handle_t->data
timer 0x1cd5200->0x2b840063b8e0
This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.
This stayed like this for at least 30 minutes. The weird thing is that this whole process works perfectly on another GPU node on the same cluster, which has older Tesla K80 GPUs. On the K80 cluster CUDA.versioninfo()
returns
CUDA runtime 11.8, artifact installation
CUDA driver 11.2
NVIDIA driver 460.73.1
CUDA libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+460.73.1
Julia packages:
- CUDA: 5.3.1
- CUDA_Driver_jll: 0.8.1+0
- CUDA_Runtime_jll: 0.12.1+0
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: Tesla K80 (sm_37, 11.170 GiB / 11.173 GiB available)
1: Tesla K80 (sm_37, 11.170 GiB / 11.173 GiB available)
Installation of CUDA.jl on the K80 cluster also took <5 minutes which points to there being a problem with the P100 setup. Any hints as to where the problem might lie would be much appreciated!