CuArrays/CUDAnative PSA: Simplified package loading

maleadt · August 23, 2019, 3:12pm

Hi all,

I’ve just tagged new versions of CuArrays/CUDAnative/… and among the usual slurry of features and bug fixes there’s a major change in how the packages are built and loaded. There used to be a Pkg.build step which could fail if you didn’t have a properly set-up CUDA GPU. This has made a lot of people very angry and been widely regarded as a bad move.

As an alternative, I have dropped this installation-time set-up and moved it to the precompilation phase, i.e., when you first import the package. As a result, you can now safely depend on CUDA packages since they won’t ever fail during installation. This is especially useful for clusters and containers, where you want to install packages in an environment that probably does not have a GPU.

Of course, loading the package might still fail if your user doesn’t have a CUDA GPU, so that’s why CUDAapi now provides a couple of useful functions to determine that:

using CUDAapi # this will NEVER fail
if has_cuda()
    try
        using CuArrays # we have CUDA, so this should not fail
    catch ex
        # something is wrong with the user's set-up (or there's a bug in CuArrays)
        @warn "CUDA is installed, but CuArrays.jl fails to load" exception=(ex,catch_backtrace())
    end
end

There’s also CUDAapi.has_cuda_gpu() to check if the user actually has a GPU.

As a result of all this, it should be possible to safely depend on any of the CUDA packages, without your users seeing errors because of not having a CUDA GPU. This is important, because it means we can use regular package version compatibility rules and don’t have to roll our own.

maleadt · August 24, 2019, 8:01am

Two notes based on user feedback:

if you see an error message LoadError: LoadError: UndefVarError: libcudnn not defined, this probably comes from Flux, which needs to be updated for the new version of CuArrays. Pin CuArrays for the time being. If the error comes from somewhere else, please file an issue.
Loading might fail with Could not find library 'cublas': this library should be part of the CUDA toolkit, so we’ve become more strict about it being available (since it underpins lots of essential functionality in CuArrays). Please make sure your CUDA installation is OK, and provides libcublas. If it does, run the failing using CuArrays with JULIA_DEBUG=CUDAapi and create an issue with details about your system and the location of libcublas.

simonbyrne · October 23, 2019, 6:50pm

I’ve tried this out on a machine without CUDA, but it seems that precompilation fails (on Julia 1.2.0):

(optgpu) pkg> st
    Status `~/misc/optgpu/Project.toml`
  [3895d2a7] CUDAapi v1.2.0
  [3a865a2d] CuArrays v1.2.1

(optgpu) pkg> precompile
Precompiling project...
Precompiling CuArrays
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
ERROR: LoadError: Could not find CUDA driver library
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope at /Users/simon/.julia/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:33
 [3] include at ./boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1094
 [5] include(::Module, ::String) at ./Base.jl:31
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:330 [inlined]
 [8] eval(::Expr) at ./client.jl:432
 [9] top-level scope at ./none:3
in expression starting at /Users/simon/.julia/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:27
ERROR: LoadError: Failed to precompile CUDAdrv [c5f51814-7f29-56b8-a69c-e4d8f6be1fde] to /Users/simon/.julia/compiled/v1.2/CUDAdrv/HMhfu.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1253
 [3] _require(::Base.PkgId) at ./loading.jl:1013
 [4] require(::Base.PkgId) at ./loading.jl:911
 [5] require(::Module, ::Symbol) at ./loading.jl:906
 [6] include at ./boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1094
 [8] include(::Module, ::String) at ./Base.jl:31
 [9] top-level scope at none:2
 [10] eval at ./boot.jl:330 [inlined]
 [11] eval(::Expr) at ./client.jl:432
 [12] top-level scope at ./none:3
in expression starting at /Users/simon/.julia/packages/CuArrays/wXQp8/src/CuArrays.jl:3

maleadt · October 24, 2019, 5:51am

Yeah, CuArrays not being loadable (as shown in the OP, guarded by a call to CUDAapi.has_gpu()) implies that it isn’t precompilable either. Well, specifically CUDAdrv fails to load here, but that’s the same story. We do discovery of the CUDA toolkit (e.g. paths for libraries to ccall) in global scope, so you’d need to precompile again on the final node again anyway.

Michaela_Maskova · November 16, 2019, 1:10pm

Hi, I hope I don’t disturb too much, but can someone navigate me throught this error when I try to use Flux#master version?

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
ERROR: LoadError: LoadError: Could not find libcublas
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] top-level scope at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\blas\CUBLAS.jl:23
 [3] include at .\boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at .\loading.jl:1094
 [5] include at .\Base.jl:31 [inlined]
 [6] include(::String) at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\CuArrays.jl:1
 [7] top-level scope at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\CuArrays.jl:39
 [8] include at .\boot.jl:328 [inlined]
 [9] include_relative(::Module, ::String) at .\loading.jl:1094
 [10] include(::Module, ::String) at .\Base.jl:31
 [11] top-level scope at none:2
 [12] eval at .\boot.jl:330 [inlined]
 [13] eval(::Expr) at .\client.jl:432
 [14] top-level scope at .\none:3
in expression starting at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\blas\CUBLAS.jl:16
in expression starting at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\CuArrays.jl:39
ERROR: LoadError: Failed to precompile CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae] to C:\Users\masen\.julia\compiled\v1.2\CuArrays\7YFE0.ji.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1253
 [3] _require(::Base.PkgId) at .\loading.jl:1013
 [4] require(::Base.PkgId) at .\loading.jl:911
 [5] require(::Module, ::Symbol) at .\loading.jl:906
 [6] include at .\boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at .\loading.jl:1094
 [8] include(::Module, ::String) at .\Base.jl:31
 [9] top-level scope at none:2
 [10] eval at .\boot.jl:330 [inlined]
 [11] eval(::Expr) at .\client.jl:432
 [12] top-level scope at .\none:3
in expression starting at C:\Users\masen\.julia\packages\Flux\jXyco\src\Flux.jl:25
ERROR: Failed to precompile Flux [587475ba-b771-5e3f-ad9e-33799f191a9c] to C:\Users\masen\.julia\compiled\v1.2\Flux\QdkVy.ji.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1253
 [3] _require(::Base.PkgId) at .\loading.jl:1013
 [4] require(::Base.PkgId) at .\loading.jl:911
 [5] require(::Module, ::Symbol) at .\loading.jl:906

I checked and there’s libcublas.jl in C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\blas folder.
I’m not very experienced with packages and I don’t even know how to try and solve this problem. Any help would be appreciated.

maleadt · November 17, 2019, 10:02am

libcublas.jl is the wrapper we ship, it’s the underlying library libcublas.so that isn’t being found. Do you have that somewhere on your system?

baggepinnen · November 19, 2019, 5:50am

I am also trying my luck with Flux#master to get CuArrays updated and hopefully avoid out-of-memory problems. The error I’m seeing on

(v1.3) pkg> st Flux
    Status `~/.julia/environments/v1.3/Project.toml`
  [5ae59095] Colors v0.9.6
  [3a865a2d] CuArrays v1.4.7 # I've tried latest master as well
  [587475ba] Flux v0.9.0 [`~/.julia/dev/Flux`]

is

could not load library "libcudnn"
libcudnn.so: cannot open shared object file: No such file or directory
#dlopen#3(::Bool, ::typeof(Libdl.dlopen), ::String, ::UInt32) at Libdl.jl:109
dlopen at Libdl.jl:109 [inlined]
dlopen at Libdl.jl:109 [inlined]
(::CuArrays.CUDNN.var"#6478#lookup_fptr#99")() at call.jl:29
macro expansion at call.jl:37 [inlined]
macro expansion at error.jl:17 [inlined]
cudnnGetProperty(::CUDAapi.libraryPropertyType, ::Base.RefValue{Int32}) at libcudnn.jl:27
cudnnGetProperty at base.jl:9 [inlined]
version() at base.jl:13
#conv!#362(::Int64, ::Int64, ::typeof(conv!), ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,1,Nothing}}, ::CuArrays.CuArray{Float32,4,Nothing}, ::DenseConvDims{2,(7, 1),1,15,(1, 1),(2, 2, 0, 0),(1, 1),false}) at nnlib.jl:46
conv!(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,1,Nothing}}, ::CuArrays.CuArray{Float32,4,Nothing}, ::DenseConvDims{2,(7, 1),1,15,(1, 1),(2, 2, 0, 0),(1, 1),false}) at nnlib.jl:46
macro expansion at conv.jl:114 [inlined]
#conv#97(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(conv), ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,1,Nothing}}, ::CuArrays.CuArray{Float32,4,Nothing}, ::DenseConvDims{2,(7, 1),1,15,(1, 1),(2, 2, 0, 0),(1, 1),false}) at

This time, I have tried to restart the computer but the problem persists.

Edit:
I managed to hack my way around it by

sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/lib/libcudnn

which maybe is not the best solution, but I couldn’t figure out a better way.

Michaela_Maskova · November 19, 2019, 6:27am

Can’t really find it (that doesn’t have to mean it’s not on my computer). Where should it be and if it’s missing is there a way to install in manually somehow?

maleadt · November 19, 2019, 6:47am

baggepinnen:

I managed to hack my way around it by
sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/lib/libcudnn
which maybe is not the best solution, but I couldn’t figure out a better way.

/usr/lib/x86_64-linux-gnu should be on your LD_LIBRARY_PATH. If you do just using Libdl; Libdl.dlopen("libcudnn") it should work.

Also, you should have seen a warning when loading Flux that CUDNN (or worse, CUDA) isn’t available.

If you install the CUDA toolkit, it provides these libraries. Are you sure you have it installed?

baggepinnen · November 19, 2019, 7:21am

Thanks, I’ll set the path accordingly.

This warning did not appear

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]

julia> gpu(Conv((2,2), 1=>1, relu, dilation=2))(gpu(randn(10,10,1,1)))
ERROR: could not load library "libcudnn"
libcudnn.so: cannot open shared object file: No such file or directory

also, it worked well without me setting any LD_LIBRARY_PATH before I updated Flux to master.

maleadt · November 19, 2019, 7:26am

You are using a “random” version of Flux – the warning is on master, and older versions shouldn’t have upgraded to the version of CuArrays you’re using (bounded by the package manager).

This is related as well, the check for CUDNN availability has changed, so your older version of Flux thinks it is available. It would have ‘worked’ before, but probably Flux would have decided not to use the CUDNN functionality at all (since it couldn’t find the library).

Michaela_Maskova · November 19, 2019, 1:30pm

Well, there’s the problem. I don’t have CUDA and if I understand it correctly, I can’t even install it as I don’t have NVIDIA graphic card…

Does it mean that I can’t run Flux on my computer?

maleadt · November 19, 2019, 1:32pm

No, you are using a version of Flux that doesn’t correctly detect CUDA availability. Use a released version, or the latest master.

Michaela_Maskova · November 19, 2019, 1:49pm

Thanks for the clarification. At least now I know what is the problem.

maleadt · November 19, 2019, 5:24pm

Now that the PR is merged, does the master branch work again for you? (I forgot that it hadn’t been merged yet)

Michaela_Maskova · November 20, 2019, 12:49pm

Still doesn’t work. I tried it on Julia 1.2.0, 1.3rc4 and 1.3rc5. Same error.

maleadt · November 20, 2019, 1:39pm

Julia version doesn’t matter. Did you update packages? Could you give some details about the rest of your environment, maybe upload a Manifest? I just tried with a Docker image that doesn’t have CUDA, and everything works as expected:

$ docker run -it --rm julia                                                                                                                                                                                                     
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.2.0 (2019-08-20)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(v1.2) pkg> add Flux
 Installed CuArrays ───────────── v1.2.1
 Installed Flux ───────────────── v0.9.0
julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]

julia> Flux.has_cuarrays()
false

julia>  gpu(Conv((2,2), 1=>1, relu, dilation=2))(gpu(randn(10,10,1,1)))
Tracked 8×8×1×1 Array{Float64,4}:

(v1.2) pkg> add Flux#master
   Cloning git-repo `https://github.com/FluxML/Flux.jl.git`
  Updating git-repo `https://github.com/FluxML/Flux.jl.git`
 Installed CuArrays ───────────── v1.4.7

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)

julia> Flux.use_cuda[]
false

julia> gpu(Conv((2,2), 1=>1, relu, dilation=2))(gpu(randn(10,10,1,1)))
8×8×1×1 Array{Float64,4}:

ozmaden · July 2, 2020, 2:20pm

Excuse me for bringing up this old thread.

I’ve been searching for ways to conditionally use the GPU in my code as well and a few threads I read point towards this one. As of writing this, the original post is deprecated as everything gets migrated over to CUDA.jl.

I’ve found this documentation (after stumbling upon this issue) very useful.

Topic		Replies	Views
Error While adding CuArrays.jl package GPU question , package , gpuarrays	12	1782	April 2, 2021
"CuArrays.CUDNN is unavailable", but it is installed General Usage question , package , cuda , flux	0	768	November 25, 2019
Can't load CuArrays GPU	17	3101	June 22, 2020
Error when setting up Flux and CuArrays Machine Learning gpu , flux	4	659	October 2, 2019
Flux failing on GPU Machine Learning	25	4064	February 21, 2020

CuArrays/CUDAnative PSA: Simplified package loading

Related topics