Running CUDA.jl test results in my PC with Ubuntu 22.04 to freeze and become unresponsive`

I have a PC that runs Ubuntu that has nvidia gpu(3080ti). I’ve installed CUDA.jl recently, and I believe pc has met all the requirements. It has Nvidia drivers installed. But when I enter

pkg> test CUDA

There was an error/warning which said LD_LIBRARY_PATH was changed eventhough I didn’t do anything with it. So I typed in code:

julia> ENV["LD_LIBRARY_PATH"] = ""

and then after I run test CUDA after a while my system freezes because of supposedly a memory leak? Why is this happening?

Perhaps you are running more threads than the amount of RAM you have allows?

Try to run top or htop in a second terminal and check your RAM usage.

Run:

# the test suite takes command-line options that allow customization; pass --help for details:
Pkg.test("CUDA"; test_args=`--help`)

to find out how to run it with fewer threads.

Please provide the actual error.

Here is the error:

┌ Warning: CUDA runtime library `libcublasLt.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libcublasLt.so.12`.
│ 
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│ 
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libnvJitLink.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libnvJitLink.so.12`.
│ 
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│ 
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libcusparse.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libcusparse.so.12`.
│ 
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│ 
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219

Those are definitely problematic – do you have LD_LIBRARY_PATH set, and if so, why?

Why do you suppose this?

As @ufechner7 mentions, this may simply be because of too many parallel tests launching. The test suite is clear about this, and prints at the start:

[ Info: Running 23 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.

Sorry for late reply. I ran into this issue on my lab’s shared computer. I recently looked up online for LD_LIBRARY_PATH and I come across a general sentiment that I should be always avoid setting LD_LIBRARY_PATH variable link. I didn’t actually set the variable but the previous lab member who installed Ubuntu might have. So I’m asked him about it and he answered that it’s used for cuda library. And it’s true that:

echo $LD_LIBRARY_PATH
> /usr/local/cuda-12.3/lib64:

The question I have is that how can I make my cuda lib work but not using LD_LIBRARY_PATH? I will definately look into it more when I’m less busy.

Regarding supposed memory leak, I kind of guessed that because after I ran the test, the computer screen was hard frozen on what it was doing. No mouse movement etc. And on a system monitor statistics on activity top bar in Ubuntu, the memory bar was filled to maximum.

I ran a test later with fewer parallel tasks (But I did’t copy the output and it was few weeks ago and I can’t remember. However I will try it with 1 thread later this week.) and it still caused a unrecoverable freeze(mem. leak?).

Lastly, I looked into z shell config file ~/.zshrc and found:

# Cuda: 13 Post-Installation Actions
export PATH="/usr/local/cuda-12.3/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH"

Should I delete these lines?

Since the last update I’ve ran several tests on different machines. I get errors each time. I don’t know why but I’m running these tests out of curiosity. Is Pkg.test should not be run on CUDA.jl? Anyways I have ran 4 tests and uploaded the terminal outputs to my google drive. Most significant errors encoutered include:
On ubuntu 22.04 pc with julia 1.10.4:

gpuarrays/linalg                              (4) |   133.38 |   0.01 |  0.0 |      26.27 |   332.00 |   3.25 |  2.4 |    5448.20 |  5158.79 |
      From worker 5:    WARNING: Method definition #3677#kernel(Any) in module Main at /home/razydave/.julia/packages/CUDA/2kjXI/test/core/execution.jl:360 overwritten at /home/razydave/.julia/packages/CUDA/2kjXI/test/core/execution.jl:368.
core/execution                                (5) |    46.27 |   0.00 |  0.0 |       0.02 |   436.00 |   0.65 |  1.4 |    1951.20 |  4364.30 |
libraries/cusparse                            (2) |   155.26 |   0.04 |  0.0 |      12.58 |   364.00 |   2.92 |  1.9 |    5857.94 |  3872.09 |
core/cudadrv                                  (2) |     9.35 |   0.00 |  0.0 |       0.00 |   376.00 |   0.00 |  0.0 |     270.01 |  3872.09 |

and after a while, it froze.

On Windows laptop with Julia 1.10.4:

libraries/cublas                              (2) |   166.17 |   0.07 |  0.0 |      41.45 |      N/A |  11.24 |  6.8 |    8474.35 |  4991.34 |
      From worker 2:    WARNING: using CUSPARSE.axpby! in module Main conflicts with an existing identifier.
libraries/cusparse                            (2) |   106.34 |   0.04 |  0.0 |      12.58 |      N/A |   7.36 |  6.9 |    4659.45 |  4991.34 |
      From worker 2:    Internal error: encountered unexpected error in runtime:
      From worker 2:    ReadOnlyMemoryError()
      From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.

And on a separate occassion on Windows:

gpuarrays/indexing scalar                     (2) |    16.27 |   0.00 |  0.0 |       0.01 |      N/A |   0.00 |  0.0 |     584.41 |  4757.24 |
libraries/cublas                              (2) |   230.10 |   0.06 |  0.0 |      41.45 |      N/A |   4.02 |  1.7 |    8475.28 |  4757.24 |
      From worker 2:    WARNING: using CUSPARSE.axpby! in module Main conflicts with an existing identifier.
libraries/cusparse                            (2) |   145.48 |   0.05 |  0.0 |      12.58 |      N/A |   2.67 |  1.8 |    4659.99 |  4757.24 |
      From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks

What are these errors mean?

Not an error, but an expected warning.

Innocuous warning.

This is problematic. Could you try the master branch of CUDA.jl? Also, which testsuite is it that’s triggering this error? It should be listed below, as a failed testsuite executed by worker 2. If it doesn’t report back, try executing with --verbose (Pkg.test("CUDA"; test_args=`--verbose`)).

Hello. How can I use Master branch in github?

]
add CUDA#master

I’ve now cloned the master branch thank you. But even after that, the problem seems to persist. I have ran julia | tee output.txt to get terminal outputs as text but it saved first few lines. As @maleadt instructed, I have ran

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.4 (2024-06-04)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.10) pkg> activate . # I have created separate env where I cloned master branch

julia> import Pkg

julia> Pkg.test("CUDA"; test_args=`--verbose`)

and the terminal output file didn’t save the rest(the test bits). I have now attached few images to my google drive folder that I have shared. Latest one is the one I have ran the master branch. I couldn’t obtain terminal outputs because the computer where frozen. There is some kind of memory leak happened because the top command shows 100% memory usage.

Have you tried running with fewer threads, as suggested above? In addition to --verbose you can pass --jobs=1.

I did, a while ago. Maybe I’ll try with --jobs=1 for tonight.