Failed test of CUDA package

There are a fair few failed tests from ‘test CUDA’ in the package manager.

Is this normal, or is this from a miss-match of driver and julia pagage?

CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.8

CUDA libraries:

  • CUBLAS: 12.3.4
  • CURAND: 10.3.4
  • CUFFT: 11.0.12
  • CUSOLVER: 11.5.4
  • CUSPARSE: 12.2.0
  • CUPTI: 21.0.0
  • NVML: 12.0.0+545.23.8

Julia packages:

  • CUDA: 5.1.1
  • CUDA_Driver_jll: 0.7.0+0
  • CUDA_Runtime_jll: 0.10.1+0

Toolchain:

  • Julia: 1.9.4
  • LLVM: 14.0.6

nvidia-smi
Sun Dec 10 21:04:32 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:04:00.0 On | N/A |
| 29% 37C P8 34W / 250W | 382MiB / 11264MiB | 10% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3404 G /usr/libexec/Xorg 229MiB |
| 0 N/A N/A 3562 G xfwm4 4MiB |
| 0 N/A N/A 4708 G …sion,SpareRendererForSitePerProcess 141MiB |
±--------------------------------------------------------------------------------------+

Failed tests:
libraries/cusolver/sparse (2) | failed at 2023-12-10T21:06:37.879
libraries/cusparse/conversions (16) | failed at 2023-12-10T21:06:45.547

libraries/cusparse/generic (49) | failed at 2023-12-10T21:06:57.911
core/cudadrv (42) | failed at 2023-12-10T21:06:58.056
base/aqua (28) | failed at 2023-12-10T21:06:58.198
base/threading (39) | failed at 2023-12-10T21:06:58.420
core/codegen (41) | failed at 2023-12-10T21:07:01.370
core/device/intrinsics (50) | failed at 2023-12-10T21:07:14.386
e/intrinsics (50) | failed at 2023-12-10T21:07:14.386
libraries/cusolver/multigpu (46) | failed at 2023-12-10T21:07:40.119
base/texture (38) | failed at 2023-12-10T21:07:48.248
base/exceptions (32) | failed at 2023-12-10T21:08:03.453
base/examples (31) | failed at 2023-12-10T21:09:01.713

Probably related to the amount of parallelism; see Testsuite could be more careful about parallel testing · Issue #2192 · JuliaGPU/CUDA.jl · GitHub. The test suite prints something about that at the start.

Thank you! The system is working for my code so I figured it was over-stressing it somehow.

FWIW, even with --jobs=1, tests fail in:

base/aqua (2) | failed at 2024-01-02T22:40:23.437

Also, on RTX 2080 Ti, Ubuntu 22.04, CUDA driver 12.2, CUDA runtime 12.3 ( artifact installation), Nvidia driver 535.129.03, Julia 1.10.0, CUDA 5.1.1

Final report is 1 Error, 9 Broken:

base/aqua                                  |     8      1              9      
base/kernelabstractions                    |  2361              4   2365      
base/texture                               |    38              4     42      
core/cudadrv                               |   139              1    140

Aqua tests are for developers; failures with them are not to be worried about.

OK, thanks!

What about the 3 that are broken:

base/kernelabstractions
base/texture
core/cudadrv

Would you like me to print more info? (how?)

Broken tests are explicitly marked as broken by CUDA.jl developers, and are not relevant to end users. They are not test failures.

1 Like