Problem with CUDA test

Hello,

This is my first posting, and I am new to Julia. I have to say I love Julia so far, once I got over my ‘scoping annoyances’. :wink:
So far I programmed a multithreaded prime-generating program on the CPU which was good fun (and superfast). Now I am trying to get into GPU programming (CUDA)

My question concerns the output of the “test CUDA” command.
I am not sure how to format this nicely, but here is the output.
The testingsuite is complaining about: testset core\initialization.

I run on a MSI laptop
OS: win11 prof
nvideacard: NVIDIA GeForce RTX 4080 Laptop GPU

Can anybody advise me on this topic?

PS: I tried to post the whole output, but it was too much for this forum. So I cut away most, end left only the summary.

Regards,
Erwin Moller

julia>

(@v1.10) pkg> add CUDA
Resolving package versions…
No Changes to C:\Users\Erwin\.julia\environments\v1.10\Project.toml
No Changes to C:\Users\Erwin\.julia\environments\v1.10\Manifest.toml

(@v1.10) pkg> test CUDA
Testing CUDA
Status C:\Users\Erwin\AppData\Local\Temp\jl_1vbZLL\Project.toml
⌅ [79e6a3ab] Adapt v3.7.2

(SNIP)

Test Summary: | Pass Fail Error Broken Total Time
Overall | 20427 5 2 9 20443
core/initialization | 1 1
gpuarrays\interface | 7 7
gpuarrays\uniformscaling | 56 56
gpuarrays\indexing scalar | 477 477
gpuarrays\reductions/any all count | 101 101
gpuarrays\math/intrinsics | 12 12
gpuarrays\indexing find | 45 45
gpuarrays\indexing multidimensional | 70 70
gpuarrays\math/power | 72 72
gpuarrays\reductions/mapreducedim!_large | 50 50
gpuarrays\reductions/reducedim! | 192 192
gpuarrays\linalg/mul!/vector-matrix | 168 168
gpuarrays\random | 62 62
gpuarrays\constructors | 900 900
gpuarrays\statistics | 84 84
gpuarrays\base | 75 75
gpuarrays\linalg | 275 275
base\aqua | 7 1 1 9
gpuarrays\linalg/mul!/matrix-matrix | 432 432
gpuarrays\reductions/== isequal | 312 312
base\broadcast | 22 22
gpuarrays\reductions/mapreduce | 396 396
base\iterator | 43 43
gpuarrays\reductions/mapreducedim! | 312 312
gpuarrays\linalg/norm | 696 696
base\linalg | 21 21
gpuarrays\reductions/reduce | 264 264
base\exceptions | 17 17
base\random | 125 125
core\apiutils | 6 6
gpuarrays\reductions/minimum maximum extrema | 666 666
base\threading | None
core\codegen | 14 14
base\examples | 7 7
core\nvml | 11 11
core\initialization | 26 4 30
core\pointer | 35 35
core\cudadrv | 138 1 139
core\utils | 55 55
base\kernelabstractions | 2361 4 2365
core\device\array | 20 20
base\array | 343 343
core\pool | 10 10
core\device\ldg | 22 22
core\device\intrinsics | 38 38
core\device\intrinsics\memory | 16 16
base\texture | 38 4 42
gpuarrays\reductions/sum prod | 862 862
core\device\random | 156 156
libraries\curand | 1 1
core\execution | 78 78
core\device\intrinsics\output | 40 40
core\device\intrinsics\atomics | 147 147
libraries\cufft | 177 177
core\device\intrinsics\math | 104 104
libraries\cusolver\multigpu | 30 30
libraries\cusparse\device | 10 10
libraries\cusolver\sparse | 112 112
gpuarrays\broadcasting | 402 402
base\sorting | 273 273
libraries\cusparse\conversions | 130 130
libraries\cusparse\broadcast | 65 65
libraries\cusparse | 713 713
core\device\intrinsics\wmma | 446 446
libraries\cusparse\linalg | 86 86
libraries\cusparse\generic | 1076 1076
libraries\cublas | 2256 2256
libraries\cusolver\dense | 2424 2424
libraries\cusparse\interfaces | 1740 1740
FAILURE

Error in testset core/initialization:
Error During Test at none:1
Got exception outside of a @test
KeyError: key “core/initialization” not found
Error in testset base\aqua:
Test Failed at C:\Users\Erwin.julia\packages\Aqua\9p8ck\src\deps_compat.jl:60
Expression: isempty(result)
Evaluated: isempty(Base.PkgId[LazyArtifacts [4af54fe1-eca0-43a8-85a7-787d91b784e3], Libdl [8f399da3-3557-5675-b5ff-fb832c97cbdb], LinearAlgebra [37e2e46d-f89d-539d-b4ee-838fcccc9c8e], Logging [56ddb016-857b-54e1-b83d-db4d58db5568], Printf [de0858da-6303-5e67-8744-51eddeeeb8d7], Random [9a3f8284-a2c9-5f02-9a11-845980a1fd5c], SparseArrays [2f01184e-e22b-5df5-ae63-d93ebab69eaf]])

Error in testset base\aqua:
Error During Test at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\setup.jl:66
Got exception outside of a @test
LoadError: UndefVarError: test_project_toml_formatting not defined
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base .\Base.jl:31
[2] top-level scope
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\test\base\aqua.jl:20
[3] include
@ .\client.jl:489 [inlined]
[4] #13
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\test\runtests.jl:97 [inlined]
[5] macro expansion
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\test\setup.jl:67 [inlined]
[6] macro expansion
@ C:\Users\Erwin.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\Test\src\Test.jl:1577 [inlined]
[7] macro expansion
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\test\setup.jl:67 [inlined]
[8] macro expansion
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\src\utilities.jl:25 [inlined]
[9] macro expansion
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\src\pool.jl:607 [inlined]
[10] top-level scope
@ C:\Users\Erwin.julia\packages\CUDA\35NC6\test\setup.jl:66
[11] eval
@ .\boot.jl:385 [inlined]
[12] runtests(f::Function, name::String, time_source::Symbol)
@ Main C:\Users\Erwin.julia\packages\CUDA\35NC6\test\setup.jl:78
[13] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
@ Base .\essentials.jl:892
[14] invokelatest(::Any, ::Any, ::Vararg{Any})
@ Base .\essentials.jl:889
[15] (::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}})()
@ Distributed C:\Users\Erwin.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\Distributed\src\process_messages.jl:287
[16] run_work_thunk(thunk::Distributed.var"#110#112"{Distributed.CallMsg{:call_fetch}}, print_error::Bool)
@ Distributed C:\Users\Erwin.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\Distributed\src\process_messages.jl:70
[17] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
@ Distributed C:\Users\Erwin.julia\juliaup\julia-1.10.2+0.x64.w64.mingw32\share\julia\stdlib\v1.10\Distributed\src\process_messages.jl:287
in expression starting at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\base\aqua.jl:20
Error in testset core\initialization:
Test Failed at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\core\initialization.jl:5
Expression: !(has_context())

Error in testset core\initialization:
Test Failed at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\core\initialization.jl:6
Expression: !(has_device())

Error in testset core\initialization:
Test Failed at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\core\initialization.jl:12
Expression: !(has_context())

Error in testset core\initialization:
Test Failed at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\core\initialization.jl:13
Expression: !(has_device())

ERROR: LoadError: Test run finished with errors
in expression starting at C:\Users\Erwin.julia\packages\CUDA\35NC6\test\runtests.jl:458
ERROR: Package CUDA errored during testing

(@v1.10) pkg>

These failures are not an issue. The aqua failures are only for developers, and the initialization tests should normally run first but that doesn’t happen here because a path separator issue (as evident by the KeyError preceding it). So everything seems to be working fine.

Also, please use triple backticks to denote output; your post is pretty hard to read.

1 Like

Actually, I thought the path separator issue was fixed already. Which version of CUDA.jl were you testing?

Thank you maleadt for your quick reply.
CUDA.versioninfo() says:

julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.4
NVIDIA driver 551.61.0

CUDA libraries:

  • CUBLAS: 12.1.3
  • CURAND: 10.3.2
  • CUFFT: 11.0.2
  • CUSOLVER: 11.4.5
  • CUSPARSE: 12.1.0
  • CUPTI: 18.0.0
  • NVML: 12.0.0+551.61

Julia packages:

  • CUDA: 4.4.1
  • CUDA_Driver_jll: 0.5.0+1
  • CUDA_Runtime_jll: 0.6.0+0

Toolchain:

  • Julia: 1.10.2
  • LLVM: 15.0.7
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
  • Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
0: NVIDIA GeForce RTX 4080 Laptop GPU (sm_89, 10.713 GiB / 11.994 GiB available)

I hope that helps.

Regards,
Erwin

This is an old version. You can see in the releases section that we’re at version 5.x already: GitHub - JuliaGPU/CUDA.jl: CUDA programming in Julia.

If you’re not sure why an old version is being installed, try forcing installation of the latest version by doing ]add CUDA#5.2. You can also always test the latest version in a clean environment by first doing ]activate --temp.

1 Like

Sounds like Markdown-ish. I’ll remember that! Thank you for your time

Hmm, that gives me:

(@v1.10) pkg> add CUDA#5.2
     Cloning git-repo `https://github.com/JuliaGPU/CUDA.jl.git`
    Updating git-repo `https://github.com/JuliaGPU/CUDA.jl.git`
ERROR: Did not find rev 5.2 in repository

Sorry, it’s CUDA@5.2.

1 Like

Thanks Maleadt,

That worked!
This is the new output for test CUDA :

Test Summary:                                  |  Pass  Fail  Broken  Total  Time
  Overall                                      | 23740     1      10  23751
    core/initialization                        |    30                   30
    gpuarrays/indexing scalar                  |   477                  477
    gpuarrays/indexing find                    |    45                   45
    gpuarrays/math/power                       |    72                   72
    gpuarrays/interface                        |     7                    7
    gpuarrays/reductions/any all count         |   101                  101
    gpuarrays/indexing multidimensional        |    73                   73
    gpuarrays/uniformscaling                   |    56                   56
    gpuarrays/reductions/reducedim!            |   192                  192
    gpuarrays/linalg/mul!/vector-matrix        |   168                  168
    gpuarrays/math/intrinsics                  |    12                   12
    gpuarrays/linalg                           |   347                  347
    gpuarrays/reductions/mapreducedim!_large   |    50                   50
    gpuarrays/statistics                       |    84                   84
    gpuarrays/linalg/mul!/matrix-matrix        |   432                  432
    gpuarrays/constructors                     |   942                  942
    gpuarrays/random                           |    62                   62
    gpuarrays/reductions/mapreduce             |   396                  396
    gpuarrays/base                             |    96                   96
    gpuarrays/linalg/norm                      |   696                  696
    gpuarrays/reductions/minimum maximum extrema |   666                  666
    gpuarrays/reductions/== isequal            |   312                  312
    base/aqua                                  |     9     1             10
    gpuarrays/reductions/mapreducedim!         |   312                  312
    gpuarrays/reductions/reduce                |   264                  264
    base/broadcast                             |    28                   28
    gpuarrays/reductions/sum prod              |   862                  862
    base/iterator                              |    43                   43
    gpuarrays/broadcasting                     |   364                  364
    base/array                                 |   400                  400
    base/linalg                                |    39                   39
    base/kernelabstractions                    |  2361             4   2365
    base/examples                              |     7                    7
    base/random                                |   236                  236
    base/threading                             |                       None
    core/apiutils                              |     6                    6
    core/codegen                               |    17                   17
    core/cudadrv                               |   138             1    139
    core/nvml                                  |    27             1     28
    core/pointer                               |    35                   35
    core/pool                                  |    10                   10
    base/texture                               |    38             4     42
    core/utils                                 |    52                   52
    core/device/array                          |    20                   20
    core/profile                               |    21                   21
    core/device/intrinsics                     |    38                   38
    core/execution                             |    82                   82
    core/device/ldg                            |    41                   41
    core/device/random                         |   156                  156
    core/device/intrinsics/atomics             |   147                  147
    base/sorting                               |   273                  273
    core/device/intrinsics/memory              |    16                   16
    core/device/intrinsics/output              |    41                   41
    core/device/intrinsics/cooperative_groups  |   515                  515
    core/device/intrinsics/math                |   112                  112
    libraries/curand                           |     1                    1
    libraries/cufft                            |   245                  245
    core/device/intrinsics/wmma                |   446                  446
    libraries/cusolver/dense_generic           |   108                  108
    libraries/cusolver/multigpu                |    30                   30
    libraries/cusparse                         |   863                  863
    libraries/cusolver/sparse                  |   112                  112
    libraries/cusolver/sparse_factorizations   |    36                   36
    libraries/cusparse/bmm                     |    20                   20
    libraries/cusparse/broadcast               |    65                   65
    libraries/cublas                           |  2421                 2421
    libraries/cusparse/device                  |    10                   10
    base/exceptions                            |    17                   17
    libraries/cusparse/conversions             |   130                  130
    libraries/cusparse/reduce                  |                       None
    libraries/cusolver/dense                   |  3904                 3904
    libraries/cusparse/generic                 |  1088                 1088
    libraries/cusparse/linalg                  |    94                   94
    libraries/cusparse/interfaces              |  2124                 2124
    FAILURE

Error in testset base/aqua:
Test Failed at C:\Users\Erwin\.julia\packages\CUDA\htRwP\test\base\aqua.jl:12
  Expression: length(ambs) ≤ 15
   Evaluated: 24 ≤ 15

ERROR: LoadError: Test run finished with errors
in expression starting at C:\Users\Erwin\.julia\packages\CUDA\htRwP\test\runtests.jl:465
ERROR: Package CUDA errored during testing

(@v1.10) pkg>

So that is only that aqua left. No problem (since I am not a developer for CUDA)

Thank you for helping out a newbie.

Regards,
Erwin

1 Like

Hey all, piggybacking on this thread.

I’m trying to test my installation of CUDA.jl but the process repeatedly get stuck. I get to this:

     Testing Running tests...
┌ Info: System information:
│ CUDA runtime 12.5, artifact installation
│ CUDA driver 12.5
│ NVIDIA driver 555.42.2
│
│ CUDA libraries:
│ - CUBLAS: 12.5.2
│ - CURAND: 10.3.6
│ - CUFFT: 11.2.3
│ - CUSOLVER: 11.6.2
│ - CUSPARSE: 12.4.1
│ - CUPTI: 23.0.0
│ - NVML: 12.0.0+555.42.2
│
│ Julia packages:
│ - CUDA: 5.4.2
│ - CUDA_Driver_jll: 0.9.0+0
│ - CUDA_Runtime_jll: 0.14.0+1
│
│ Toolchain:
│ - Julia: 1.10.4
│ - LLVM: 15.0.7
│
│ 1 device:
└   0: NVIDIA GeForce GTX 960 (sm_52, 3.906 GiB / 4.000 GiB available)
[ Info: Testing using device 0 (NVIDIA GeForce GTX 960). To change this, specify the `--gpu` argument to the tests, or set the `CUDA_VISIBLE_DEVICES` environment variable.
[ Info: Running 1 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
                                                  |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
core/initialization                           (2) |     4.25 |   0.00 |  0.0 |       0.00 |    38.75 |   0.01 |  0.2 |      68.11 |   960.12 |
gpuarrays/reductions/sum prod                 (2) |   140.65 |   0.03 |  0.0 |       3.24 |    47.25 |   3.96 |  2.8 |    9478.70 |  2582.38 |
gpuarrays/reductions/reduce                   (2) |    61.18 |   0.01 |  0.0 |       1.21 |    48.25 |   2.07 |  3.4 |    5538.02 |  2908.03 |
gpuarrays/reductions/mapreducedim!            (2) |    58.97 |   0.01 |  0.0 |       1.54 |    50.25 |   1.59 |  2.7 |    3610.01 |  3391.95 |
gpuarrays/broadcasting                        (2) |   142.37 |   0.03 |  0.0 |       2.00 |    55.25 |   4.37 |  3.1 |   10717.22 |  4425.68 |
gpuarrays/reductions/== isequal               (2) |    41.19 |   0.02 |  0.0 |       1.07 |    56.25 |   1.21 |  2.9 |    3864.05 |  4718.70 |
gpuarrays/base                                (2) |    24.70 |   0.00 |  0.0 |       8.90 |    56.25 |   1.35 |  5.5 |    2331.02 |  5102.08 |
gpuarrays/random                              (2) |    22.33 |   0.00 |  0.0 |       0.03 |    57.25 |   0.22 |  1.0 |     931.57 |  5110.96 |
gpuarrays/vectors                             (2) |     0.27 |   0.00 |  0.0 |       0.00 |    57.25 |   0.00 |  0.0 |      16.70 |  5111.02 |
gpuarrays/constructors                        (2) |    20.33 |   0.01 |  0.1 |       0.65 |    57.25 |   0.58 |  2.8 |    1068.17 |  5708.57 |
gpuarrays/reductions/mapreduce                (2) |    27.10 |   0.02 |  0.1 |       1.81 |    57.25 |   0.54 |  2.0 |    1968.25 |  5708.57 |
gpuarrays/statistics                          (2) |    57.47 |   0.01 |  0.0 |       1.51 |    66.25 |   1.16 |  2.0 |    3771.41 |  5932.20 |

then whatever process is being tested takes hours and never completes. Any guess as to what might be causing that?

EDIT: I tried again and when the program is left hanging, I can’t even ssh to the computer where the test is taking place. In the end, I have to pull the plug on the machine.

Hi jrekier,

I googled and found this overview:

Look up your CPU ( NVIDIA GeForce GTX 960 ) and you’ll see that is it a bit old.
You maximum CUDA seems to be until CUDA 10.

But you are using CUDA driver 12.5

Hey @ErwinMoller, thanks.

Hm, that might be why. Although I’ve been running some tests manually and it seems to be working okay, aside of the automated CUDA.jl test…