Using CUDA hangs with P100 GPU

Hello. I’m trying to install CUDA.jl inside an interactive session on a cluster node with an NVIDIA Tesla P100 GPU. The output of nvidia-smi is

| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  On   | 00000000:03:00.0 Off |                    0 |
| N/A   27C    P0    28W / 250W |    290MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |

Running versioninfo() in Julia returns

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 Γ— Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)

And here is the output when attempting to install CUDA.jl

(v1.10) pkg> add CUDA
  Installing known registries into `~/.julia`
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
   Installed GPUArraysCore ─────────────── v0.1.6
   Installed Scratch ───────────────────── v1.2.1
   Installed Crayons ───────────────────── v4.1.1
   Installed Adapt ─────────────────────── v4.0.4
   Installed ColorTypes ────────────────── v0.11.5
   Installed TableTraits ───────────────── v1.0.1
   Installed Preferences ───────────────── v1.4.3
   Installed CUDA_Driver_jll ───────────── v0.8.1+0
   Installed LLVMLoopInfo ──────────────── v1.0.0
   Installed GPUCompiler ───────────────── v0.26.4
   Installed DataAPI ───────────────────── v1.16.0
   Installed SentinelArrays ────────────── v1.4.1
   Installed Parsers ───────────────────── v2.8.1
   Installed Tables ────────────────────── v1.11.1
   Installed FixedPointNumbers ─────────── v0.8.4
   Installed PrettyTables ──────────────── v2.3.1
   Installed PooledArrays ──────────────── v1.4.3
   Installed TimerOutputs ──────────────── v0.5.23
   Installed JLLWrappers ───────────────── v1.5.0
   Installed InlineStrings ─────────────── v1.4.0
   Installed StaticArraysCore ──────────── v1.4.2
   Installed AbstractFFTs ──────────────── v1.5.0
   Installed IteratorInterfaceExtensions ─ v1.0.0
   Installed StaticArrays ──────────────── v1.9.3
   Installed PrecompileTools ───────────── v1.2.1
   Installed CUDA_Runtime_Discovery ────── v0.2.4
   Installed DataValueInterfaces ───────── v1.0.0
   Installed LLVMExtra_jll ─────────────── v0.0.29+0
   Installed NVTX ──────────────────────── v0.3.4
   Installed NVTX_jll ──────────────────── v3.1.0+2
   Installed OrderedCollections ────────── v1.6.3
   Installed LaTeXStrings ──────────────── v1.3.1
   Installed CEnum ─────────────────────── v0.5.0
   Installed UnsafeAtomicsLLVM ─────────── v0.1.3
   Installed InvertedIndices ───────────── v1.3.0
   Installed JuliaNVTXCallbacks_jll ────── v0.2.1+0
   Installed CUDA_Runtime_jll ──────────── v0.12.1+0
   Installed Reexport ──────────────────── v1.2.2
   Installed BFloat16s ─────────────────── v0.5.0
   Installed GPUArrays ─────────────────── v10.1.0
   Installed Random123 ─────────────────── v1.7.0
   Installed RandomNumbers ─────────────── v1.5.3
   Installed DataFrames ────────────────── v1.6.1
   Installed Requires ──────────────────── v1.3.0
   Installed Compat ────────────────────── v4.14.0
   Installed ExprTools ─────────────────── v0.1.10
   Installed DataStructures ────────────── v0.18.20
   Installed MacroTools ────────────────── v0.5.13
   Installed KernelAbstractions ────────── v0.9.18
   Installed Colors ────────────────────── v0.12.10
   Installed UnsafeAtomics ─────────────── v0.2.1
   Installed Missings ──────────────────── v1.2.0
   Installed StringManipulation ────────── v0.3.4
   Installed SortingAlgorithms ─────────── v1.2.1
   Installed Atomix ────────────────────── v0.1.0
   Installed LLVM ──────────────────────── v6.6.3
   Installed CUDA ──────────────────────── v5.3.1
  Downloaded artifact: LLVMExtra
  Downloaded artifact: NVTX
  Downloaded artifact: JuliaNVTXCallbacks
    Updating `/.../.julia/environments/v1.10/Project.toml`
  [052768ef] + CUDA v5.3.1
    Updating `/.../.julia/environments/v1.10/Manifest.toml`
  Downloaded artifact: CUDA_Driver

[pid 47305] waiting for IO to finish:
 Handle type        uv_handle_t->data
 timer              0x1cd5200->0x2b840063b8e0
This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.

[pid 47305] waiting for IO to finish:
 Handle type        uv_handle_t->data
 timer              0x1cd5200->0x2b840063b8e0
This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.

This stayed like this for at least 30 minutes. The weird thing is that this whole process works perfectly on another GPU node on the same cluster, which has older Tesla K80 GPUs. On the K80 cluster CUDA.versioninfo() returns

CUDA runtime 11.8, artifact installation
CUDA driver 11.2
NVIDIA driver 460.73.1

CUDA libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+460.73.1

Julia packages:
- CUDA: 5.3.1
- CUDA_Driver_jll: 0.8.1+0
- CUDA_Runtime_jll: 0.12.1+0

- Julia: 1.10.2
- LLVM: 15.0.7

2 devices:
  0: Tesla K80 (sm_37, 11.170 GiB / 11.173 GiB available)
  1: Tesla K80 (sm_37, 11.170 GiB / 11.173 GiB available)

Installation of CUDA.jl on the K80 cluster also took <5 minutes which points to there being a problem with the P100 setup. Any hints as to where the problem might lie would be much appreciated!

Hard to tell what’s causing this, see the documentation for a couple of suggestions: Fixing precompilation hangs due to open tasks or IO Β· The Julia Language

1 Like

The problem seems to have resolved itself overnight. Running the same sequence of commands today has resulted in CUDA.jl being successfully installed. Perhaps the GPU node was at fault yesterday. Thank you @maleadt for your suggestion.