I have a PC that runs Ubuntu that has nvidia gpu(3080ti). I’ve installed CUDA.jl recently, and I believe pc has met all the requirements. It has Nvidia drivers installed. But when I enter
pkg> test CUDA
There was an error/warning which said LD_LIBRARY_PATH was changed eventhough I didn’t do anything with it. So I typed in code:
julia> ENV["LD_LIBRARY_PATH"] = ""
and then after I run test CUDA after a while my system freezes because of supposedly a memory leak? Why is this happening?
┌ Warning: CUDA runtime library `libcublasLt.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libcublasLt.so.12`.
│
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libnvJitLink.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libnvJitLink.so.12`.
│
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
┌ Warning: CUDA runtime library `libcusparse.so.12` was loaded from a system path, `/usr/local/cuda-12.3/lib64/libcusparse.so.12`.
│
│ This may cause errors. Ensure that you have not set the LD_LIBRARY_PATH
│ environment variable, or that it does not contain paths to CUDA libraries.
│
│ In any other case, please file an issue.
└ @ CUDA ~/.julia/packages/CUDA/Tl08O/src/initialization.jl:219
Those are definitely problematic – do you have LD_LIBRARY_PATH set, and if so, why?
Why do you suppose this?
As @ufechner7 mentions, this may simply be because of too many parallel tests launching. The test suite is clear about this, and prints at the start:
[ Info: Running 23 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
Sorry for late reply. I ran into this issue on my lab’s shared computer. I recently looked up online for LD_LIBRARY_PATH and I come across a general sentiment that I should be always avoid setting LD_LIBRARY_PATH variable link. I didn’t actually set the variable but the previous lab member who installed Ubuntu might have. So I’m asked him about it and he answered that it’s used for cuda library. And it’s true that:
The question I have is that how can I make my cuda lib work but not using LD_LIBRARY_PATH? I will definately look into it more when I’m less busy.
Regarding supposed memory leak, I kind of guessed that because after I ran the test, the computer screen was hard frozen on what it was doing. No mouse movement etc. And on a system monitor statistics on activity top bar in Ubuntu, the memory bar was filled to maximum.
I ran a test later with fewer parallel tasks (But I did’t copy the output and it was few weeks ago and I can’t remember. However I will try it with 1 thread later this week.) and it still caused a unrecoverable freeze(mem. leak?).
Lastly, I looked into z shell config file ~/.zshrc and found:
Since the last update I’ve ran several tests on different machines. I get errors each time. I don’t know why but I’m running these tests out of curiosity. Is Pkg.test should not be run on CUDA.jl? Anyways I have ran 4 tests and uploaded the terminal outputs to my google drive. Most significant errors encoutered include:
On ubuntu 22.04 pc with julia 1.10.4:
libraries/cublas (2) | 166.17 | 0.07 | 0.0 | 41.45 | N/A | 11.24 | 6.8 | 8474.35 | 4991.34 |
From worker 2: WARNING: using CUSPARSE.axpby! in module Main conflicts with an existing identifier.
libraries/cusparse (2) | 106.34 | 0.04 | 0.0 | 12.58 | N/A | 7.36 | 6.9 | 4659.45 | 4991.34 |
From worker 2: Internal error: encountered unexpected error in runtime:
From worker 2: ReadOnlyMemoryError()
From worker 2:
From worker 2: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
And on a separate occassion on Windows:
gpuarrays/indexing scalar (2) | 16.27 | 0.00 | 0.0 | 0.01 | N/A | 0.00 | 0.0 | 584.41 | 4757.24 |
libraries/cublas (2) | 230.10 | 0.06 | 0.0 | 41.45 | N/A | 4.02 | 1.7 | 8475.28 | 4757.24 |
From worker 2: WARNING: using CUSPARSE.axpby! in module Main conflicts with an existing identifier.
libraries/cusparse (2) | 145.48 | 0.05 | 0.0 | 12.58 | N/A | 2.67 | 1.8 | 4659.99 | 4757.24 |
From worker 2:
From worker 2: Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks
This is problematic. Could you try the master branch of CUDA.jl? Also, which testsuite is it that’s triggering this error? It should be listed below, as a failed testsuite executed by worker 2. If it doesn’t report back, try executing with --verbose (Pkg.test("CUDA"; test_args=`--verbose`)).
I’ve now cloned the master branch thank you. But even after that, the problem seems to persist. I have ran julia | tee output.txt to get terminal outputs as text but it saved first few lines. As @maleadt instructed, I have ran
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.10.4 (2024-06-04)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(@v1.10) pkg> activate . # I have created separate env where I cloned master branch
julia> import Pkg
julia> Pkg.test("CUDA"; test_args=`--verbose`)
and the terminal output file didn’t save the rest(the test bits). I have now attached few images to my google drive folder that I have shared. Latest one is the one I have ran the master branch. I couldn’t obtain terminal outputs because the computer where frozen. There is some kind of memory leak happened because the top command shows 100% memory usage.