ERROR: could not load library libcutensor.so

samo · July 14, 2020, 7:19am

After a fresh installation of CUDA.jl 1.1/1.0, we got the following error on a cluster when trying to precompile a package that has CUDA in its dependencies:

[ Info: Precompiling ParallelStencil [94395366-693c-11ea-3b26-d9b7aac5d958]
Downloading artifact: CUDNN_CUDA102
Downloading artifact: CUTENSOR_CUDA102
ERROR: LoadError: LoadError: LoadError: could not load library "/home/lraess/.julia/artifacts/fbe34931d3c1bebd56fbc2edba0f8ece5295fed7/lib/libcutensor.so"
/lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/lraess/.julia/artifacts/fbe34931d3c1bebd56fbc2edba0f8ece5295fed7/lib/libcutensor.so)
Stacktrace:

NOTE 1: the package does not use cutensors.
NOTE 2: CUDA was installed as follows:

module load CUDA/10.0
export JULIA_CUDA_USE_BINARYBUILDER=false
julia
] add CUDA

We would appreciate quick help very much as we need to run something there for the JuliaCon video!

Thanks!!

giordano · July 14, 2020, 9:50am

What version of glibc do you have on the cluster? Looks something pretty old?

samo · July 14, 2020, 10:39am

Thanks for your reply @giordano. Yes, it looks like the libc.so.6 that is found is too old. There is another one available on the cluster and when I set the path to it in LD_LIBRARY_PATH there is no more libc.so.6-error given when doing ldd on libcutensor.so.
However with that exported julia cannot be started: it segfaults at startup. Also defining the LD_LIBRARY_PATH inside julia using ENV did not work. It gave the following error:

julia> ENV["LD_LIBRARY_PATH"] = "/soft/glibc/glibc-2.17/lib:$(ENV["LD_LIBRARY_PATH"])"
"/soft/glibc/glibc-2.17/lib:/soft/glibc/glibc-2.17/lib:\$LD_LIBRARY_PATH"

julia> using ParallelStencil
[ Info: Precompiling ParallelStencil [94395366-693c-11ea-3b26-d9b7aac5d958]
ERROR: IOError: write: broken pipe (EPIPE)
Stacktrace:
 [1] uv_write(::Base.PipeEndpoint, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:953
 [2] unsafe_write(::Base.PipeEndpoint, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:1007
 [3] write(::Base.PipeEndpoint, ::String) at ./strings/io.jl:183
 [4] create_expr_cache(::String, ::String, ::Array{Pair{Base.PkgId,UInt64},1}, ::Base.UUID) at ./loading.jl:1176
 [5] compilecache(::Base.PkgId, ::String) at ./loading.jl:1261
 [6] _require(::Base.PkgId) at ./loading.jl:1029
 [7] require(::Base.PkgId) at ./loading.jl:927
 [8] require(::Module, ::Symbol) at ./loading.jl:922

giordano · July 14, 2020, 10:58am

Yeah, I wouldn’t expect changing this variable inside Julia to be too much helpful. Out of curiosity, do you have any CUDA module available on the cluster? What version? Maybe you can try with an older version of CUDA provided by Julia.

CC: @maleadt

maleadt · July 14, 2020, 11:19am

If that’s the case you can use JULIA_CUDA_USE_BINARYBUILDER=false.
If you want to keep using artifacts, you can use an older version that does not provide CUTENSOR by specifying JULIA_CUDA_VERSION=.... Or you could disable the CUTENSOR version check in __runtime_init__ (in CUDA.jl’s initialization.jl), it’s that check that triggers use of the library.

giordano · July 14, 2020, 11:24am

Actually, the snippet at the top is already setting JULIA_CUDA_USE_BINARYBUILDER=false

maleadt · July 14, 2020, 11:25am

Probably only during Pkg.add, and the value isn’t cached anywhere currently.

samo · July 14, 2020, 11:29am

We exported it before Pkg.add, yes. Does one need to export it before every run then (at least when precompilation happens)?

maleadt · July 14, 2020, 11:29am

The environment variable needs to be defined like that every time CUDA.jl is loaded, as it looks for artifacts whenever the module is initialized.

samo · July 14, 2020, 11:30am

Oh, I see, this clarifies a lot. I will try…

samo · July 14, 2020, 11:40am

Setting JULIA_CUDA_USE_BINARYBUILDER=false also for running (including precompiling) solved the problem as it does not anymore use artifacts.

BTW: I think it should be hightlighted in the doc here that JULIA_CUDA_USE_BINARYBUILDER=false must not only be set for Pkg.add/Pkg.build.

Thanks!!

samo · July 14, 2020, 11:48am

@maleadt, a small related question: do I assume right that CUDA-aware MPI will not be able to work when one uses artifacts, right? For CUDA-aware MPI, one needs to have a CUDA-aware MPI installation and build CUDA.jl and MPI.jl with the CUDA and MPI used for the CUDA-aware MPI installation, right?

maleadt · July 14, 2020, 11:59am

I don’t have experience with CUDA aware MPI + different CUDA toolkits (maybe @vchuravy or @simonbyrne do) . Generally the toolkit is backwards comptible though.

vchuravy · July 14, 2020, 4:04pm

Yes for CUDA-aware MPI you should use CUDA and MPI provided by your cluster administrator.

export JULIA_MPI_BINARY=system
export JULIA_CUDA_USE_BINARYBUILDER=false

samo · July 14, 2020, 4:30pm

Thanks @vchuravy. This is what we have done. I am just trying to understand if it in general it was also possible to use CUDA-aware MPI with artifacts as I know multiple people getting started with small clusters / multi-GPU desktops with Julia, GPU and MPI… So to get started, or when a quick temporary solution is needed it could be nice to be able to use CUDA-aware MPI with artifacts…

ToucheSir · July 14, 2020, 5:52pm

Seconding this. AFAIK CUDA-aware MPI is the only way to do multi-gpu sync on a desktop machine/local workstation, but unlike in a cluster it’s almost certainly not installed and I haven’t been able to find binaries anywhere…

samo · July 20, 2020, 2:17pm

@maleadt, it would seem more intuitive to me that one needs to specify JULIA_CUDA_USE_BINARYBUILDER=false only at installation time. BTW: I think that is what @simonbyrne meant in this issue when he wrote " It would also be useful to have a mechanism to save these preferences so that they persist between sessions/versions (FFTW.jl and MPI.jl save their preferences to a file in .julia/prefs/".

I also believe that this would be an improvement - or is there anything that speak against it?

simonbyrne · July 20, 2020, 3:51pm

There is some discussion of this issue here: https://github.com/JuliaGPU/CUDA.jl/issues/204

maleadt · July 23, 2020, 1:20pm

I was waiting for Pkg support before doing so: Add `Preferences` subsystem by staticfloat · Pull Request #1835 · JuliaLang/Pkg.jl · GitHub

samo · July 24, 2020, 6:24am

Sure, that makes sense.

Topic		Replies	Views
Unable to use CUDA from artifacts GPU cuda	5	1736	January 28, 2023
LoadError: CUDA runtime not found GPU question , linux , cuda , error	11	2535	March 25, 2023
CUDA and NVTX fail to precompile on cluster GPU question , package	7	156	September 16, 2024
Cannot install CUDA.jl on Linux General Usage cuda	3	2845	July 17, 2021
CUDA package test shows error Package Management package , cuda	9	739	December 8, 2021

ERROR: could not load library libcutensor.so

Related topics