Running CUDA.jl test results in my PC with Ubuntu 22.04 to freeze and become unresponsive`

Mend-Amar_Badral · August 21, 2024, 12:44am

I have a PC that runs Ubuntu that has nvidia gpu(3080ti). I’ve installed CUDA.jl recently, and I believe pc has met all the requirements. It has Nvidia drivers installed. But when I enter

pkg> test CUDA

There was an error/warning which said LD_LIBRARY_PATH was changed eventhough I didn’t do anything with it. So I typed in code:

julia> ENV["LD_LIBRARY_PATH"] = ""

and then after I run test CUDA after a while my system freezes because of supposedly a memory leak? Why is this happening?

ufechner7 · August 21, 2024, 3:48am

Perhaps you are running more threads than the amount of RAM you have allows?

Try to run top or htop in a second terminal and check your RAM usage.

Run:

# the test suite takes command-line options that allow customization; pass --help for details:
Pkg.test("CUDA"; test_args=`--help`)

to find out how to run it with fewer threads.

maleadt · August 21, 2024, 6:10am

Please provide the actual error.

Mend-Amar_Badral · August 21, 2024, 12:43pm

Here is the error:

┌ Warning: CUDA runtime library `libcublasLt.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libcublasLt.so.12`.
│ 
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│ 
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libnvJitLink.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libnvJitLink.so.12`.
│ 
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│ 
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libcusparse.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libcusparse.so.12`.
│ 
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│ 
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219

maleadt · August 21, 2024, 1:24pm

Those are definitely problematic – do you have LD_LIBRARY_PATH set, and if so, why?

Why do you suppose this?

As @ufechner7 mentions, this may simply be because of too many parallel tests launching. The test suite is clear about this, and prints at the start:

[ Info: Running 23 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.

Mend-Amar_Badral · September 11, 2024, 9:27am

Sorry for late reply. I ran into this issue on my lab’s shared computer. I recently looked up online for LD_LIBRARY_PATH and I come across a general sentiment that I should be always avoid setting LD_LIBRARY_PATH variable link. I didn’t actually set the variable but the previous lab member who installed Ubuntu might have. So I’m asked him about it and he answered that it’s used for cuda library. And it’s true that:

echo $LD_LIBRARY_PATH
> /usr/local/cuda-12.3/lib64:

The question I have is that how can I make my cuda lib work but not using LD_LIBRARY_PATH? I will definately look into it more when I’m less busy.

Regarding supposed memory leak, I kind of guessed that because after I ran the test, the computer screen was hard frozen on what it was doing. No mouse movement etc. And on a system monitor statistics on activity top bar in Ubuntu, the memory bar was filled to maximum.

I ran a test later with fewer parallel tasks (But I did’t copy the output and it was few weeks ago and I can’t remember. However I will try it with 1 thread later this week.) and it still caused a unrecoverable freeze(mem. leak?).

Lastly, I looked into z shell config file ~/.zshrc and found:

# Cuda: 13 Post-Installation Actions
export PATH="/usr/local/cuda-12.3/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH"

Should I delete these lines?

Mend-Amar_Badral · October 27, 2024, 3:13pm

Since the last update I’ve ran several tests on different machines. I get errors each time. I don’t know why but I’m running these tests out of curiosity. Is Pkg.test should not be run on CUDA.jl? Anyways I have ran 4 tests and uploaded the terminal outputs to my google drive. Most significant errors encoutered include:
On ubuntu 22.04 pc with julia 1.10.4:

gpuarrays/linalg                              (4) |   133.38 |   0.01 |  0.0 |      26.27 |   332.00 |   3.25 |  2.4 |    5448.20 |  5158.79 |
      From worker 5:    WARNING: Method definition #3677#kernel(Any) in module Main at /home/razydave/.julia/packages/CUDA/2kjXI/test/core/execution.jl:360 overwritten at /home/razydave/.julia/packages/CUDA/2kjXI/test/core/execution.jl:368.
core/execution                                (5) |    46.27 |   0.00 |  0.0 |       0.02 |   436.00 |   0.65 |  1.4 |    1951.20 |  4364.30 |
libraries/cusparse                            (2) |   155.26 |   0.04 |  0.0 |      12.58 |   364.00 |   2.92 |  1.9 |    5857.94 |  3872.09 |
core/cudadrv                                  (2) |     9.35 |   0.00 |  0.0 |       0.00 |   376.00 |   0.00 |  0.0 |     270.01 |  3872.09 |

and after a while, it froze.

On Windows laptop with Julia 1.10.4:

libraries/cublas                              (2) |   166.17 |   0.07 |  0.0 |      41.45 |      N/A |  11.24 |  6.8 |    8474.35 |  4991.34 |
      From worker 2:    WARNING: using CUSPARSE.axpby! in module Main conflicts with an existing identifier.
libraries/cusparse                            (2) |   106.34 |   0.04 |  0.0 |      12.58 |      N/A |   7.36 |  6.9 |    4659.45 |  4991.34 |
      From worker 2:    Internal error: encountered unexpected error in runtime:
      From worker 2:    ReadOnlyMemoryError()
      From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.

And on a separate occassion on Windows:

gpuarrays/indexing scalar                     (2) |    16.27 |   0.00 |  0.0 |       0.01 |      N/A |   0.00 |  0.0 |     584.41 |  4757.24 |
libraries/cublas                              (2) |   230.10 |   0.06 |  0.0 |      41.45 |      N/A |   4.02 |  1.7 |    8475.28 |  4757.24 |
      From worker 2:    WARNING: using CUSPARSE.axpby! in module Main conflicts with an existing identifier.
libraries/cusparse                            (2) |   145.48 |   0.05 |  0.0 |      12.58 |      N/A |   2.67 |  1.8 |    4659.99 |  4757.24 |
      From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks

What are these errors mean?

maleadt · October 28, 2024, 7:30am

Not an error, but an expected warning.

Innocuous warning.

This is problematic. Could you try the master branch of CUDA.jl? Also, which testsuite is it that’s triggering this error? It should be listed below, as a failed testsuite executed by worker 2. If it doesn’t report back, try executing with --verbose (Pkg.test("CUDA"; test_args=`--verbose`)).

Mend-Amar_Badral · October 29, 2024, 9:52am

Hello. How can I use Master branch in github?

ufechner7 · October 29, 2024, 9:55am

]
add CUDA#master

Mend-Amar_Badral · October 30, 2024, 3:34pm

I’ve now cloned the master branch thank you. But even after that, the problem seems to persist. I have ran julia | tee output.txt to get terminal outputs as text but it saved first few lines. As @maleadt instructed, I have ran

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.4 (2024-06-04)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.10) pkg> activate . # I have created separate env where I cloned master branch

julia> import Pkg

julia> Pkg.test("CUDA"; test_args=`--verbose`)

and the terminal output file didn’t save the rest(the test bits). I have now attached few images to my google drive folder that I have shared. Latest one is the one I have ran the master branch. I couldn’t obtain terminal outputs because the computer where frozen. There is some kind of memory leak happened because the top command shows 100% memory usage.

maleadt · October 30, 2024, 3:41pm

Have you tried running with fewer threads, as suggested above? In addition to --verbose you can pass --jobs=1.

Mend-Amar_Badral · October 30, 2024, 4:03pm

I did, a while ago. Maybe I’ll try with --jobs=1 for tonight.

Topic		Replies	Views
Path to CUDA driver New to Julia gpu , cuda , installation	1	767	September 7, 2022
CUDA.jl testing errors GPU question , cuda , error	1	433	December 6, 2023
CUDA test random failure GPU	4	971	January 4, 2023
Can't get CUDA.jl to see cuda in Ubuntu General Usage linux , cuda , ubuntu	16	1297	October 4, 2023
Need help with CUDA.jl installation GPU question , cuda	4	2111	May 10, 2021

Running CUDA.jl test results in my PC with Ubuntu 22.04 to freeze and become unresponsive`

Related topics