CuArrays/CUDAnative PSA: Simplified package loading

Hi all,

I’ve just tagged new versions of CuArrays/CUDAnative/… and among the usual slurry of features and bug fixes there’s a major change in how the packages are built and loaded. There used to be a Pkg.build step which could fail if you didn’t have a properly set-up CUDA GPU. This has made a lot of people very angry and been widely regarded as a bad move.

As an alternative, I have dropped this installation-time set-up and moved it to the precompilation phase, i.e., when you first import the package. As a result, you can now safely depend on CUDA packages since they won’t ever fail during installation. This is especially useful for clusters and containers, where you want to install packages in an environment that probably does not have a GPU.

Of course, loading the package might still fail if your user doesn’t have a CUDA GPU, so that’s why CUDAapi now provides a couple of useful functions to determine that:

using CUDAapi # this will NEVER fail
if has_cuda()
    try
        using CuArrays # we have CUDA, so this should not fail
    catch ex
        # something is wrong with the user's set-up (or there's a bug in CuArrays)
        @warn "CUDA is installed, but CuArrays.jl fails to load" exception=(ex,catch_backtrace())
    end
end

There’s also CUDAapi.has_cuda_gpu() to check if the user actually has a GPU.

As a result of all this, it should be possible to safely depend on any of the CUDA packages, without your users seeing errors because of not having a CUDA GPU. This is important, because it means we can use regular package version compatibility rules and don’t have to roll our own.

28 Likes

Two notes based on user feedback:

  1. if you see an error message LoadError: LoadError: UndefVarError: libcudnn not defined, this probably comes from Flux, which needs to be updated for the new version of CuArrays. Pin CuArrays for the time being. If the error comes from somewhere else, please file an issue.

  2. Loading might fail with Could not find library 'cublas': this library should be part of the CUDA toolkit, so we’ve become more strict about it being available (since it underpins lots of essential functionality in CuArrays). Please make sure your CUDA installation is OK, and provides libcublas. If it does, run the failing using CuArrays with JULIA_DEBUG=CUDAapi and create an issue with details about your system and the location of libcublas.

3 Likes

I’ve tried this out on a machine without CUDA, but it seems that precompilation fails (on Julia 1.2.0):

(optgpu) pkg> st
    Status `~/misc/optgpu/Project.toml`
  [3895d2a7] CUDAapi v1.2.0
  [3a865a2d] CuArrays v1.2.1

(optgpu) pkg> precompile
Precompiling project...
Precompiling CuArrays
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
ERROR: LoadError: Could not find CUDA driver library
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope at /Users/simon/.julia/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:33
 [3] include at ./boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1094
 [5] include(::Module, ::String) at ./Base.jl:31
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:330 [inlined]
 [8] eval(::Expr) at ./client.jl:432
 [9] top-level scope at ./none:3
in expression starting at /Users/simon/.julia/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:27
ERROR: LoadError: Failed to precompile CUDAdrv [c5f51814-7f29-56b8-a69c-e4d8f6be1fde] to /Users/simon/.julia/compiled/v1.2/CUDAdrv/HMhfu.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1253
 [3] _require(::Base.PkgId) at ./loading.jl:1013
 [4] require(::Base.PkgId) at ./loading.jl:911
 [5] require(::Module, ::Symbol) at ./loading.jl:906
 [6] include at ./boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1094
 [8] include(::Module, ::String) at ./Base.jl:31
 [9] top-level scope at none:2
 [10] eval at ./boot.jl:330 [inlined]
 [11] eval(::Expr) at ./client.jl:432
 [12] top-level scope at ./none:3
in expression starting at /Users/simon/.julia/packages/CuArrays/wXQp8/src/CuArrays.jl:3

Yeah, CuArrays not being loadable (as shown in the OP, guarded by a call to CUDAapi.has_gpu()) implies that it isn’t precompilable either. Well, specifically CUDAdrv fails to load here, but that’s the same story. We do discovery of the CUDA toolkit (e.g. paths for libraries to ccall) in global scope, so you’d need to precompile again on the final node again anyway.

Hi, I hope I don’t disturb too much, but can someone navigate me throught this error when I try to use Flux#master version?

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
ERROR: LoadError: LoadError: Could not find libcublas
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] top-level scope at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\blas\CUBLAS.jl:23
 [3] include at .\boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at .\loading.jl:1094
 [5] include at .\Base.jl:31 [inlined]
 [6] include(::String) at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\CuArrays.jl:1
 [7] top-level scope at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\CuArrays.jl:39
 [8] include at .\boot.jl:328 [inlined]
 [9] include_relative(::Module, ::String) at .\loading.jl:1094
 [10] include(::Module, ::String) at .\Base.jl:31
 [11] top-level scope at none:2
 [12] eval at .\boot.jl:330 [inlined]
 [13] eval(::Expr) at .\client.jl:432
 [14] top-level scope at .\none:3
in expression starting at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\blas\CUBLAS.jl:16
in expression starting at C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\CuArrays.jl:39
ERROR: LoadError: Failed to precompile CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae] to C:\Users\masen\.julia\compiled\v1.2\CuArrays\7YFE0.ji.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1253
 [3] _require(::Base.PkgId) at .\loading.jl:1013
 [4] require(::Base.PkgId) at .\loading.jl:911
 [5] require(::Module, ::Symbol) at .\loading.jl:906
 [6] include at .\boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at .\loading.jl:1094
 [8] include(::Module, ::String) at .\Base.jl:31
 [9] top-level scope at none:2
 [10] eval at .\boot.jl:330 [inlined]
 [11] eval(::Expr) at .\client.jl:432
 [12] top-level scope at .\none:3
in expression starting at C:\Users\masen\.julia\packages\Flux\jXyco\src\Flux.jl:25
ERROR: Failed to precompile Flux [587475ba-b771-5e3f-ad9e-33799f191a9c] to C:\Users\masen\.julia\compiled\v1.2\Flux\QdkVy.ji.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at .\loading.jl:1253
 [3] _require(::Base.PkgId) at .\loading.jl:1013
 [4] require(::Base.PkgId) at .\loading.jl:911
 [5] require(::Module, ::Symbol) at .\loading.jl:906

I checked and there’s libcublas.jl in C:\Users\masen\.julia\packages\CuArrays\7z7MV\src\blas folder.
I’m not very experienced with packages and I don’t even know how to try and solve this problem. Any help would be appreciated.

libcublas.jl is the wrapper we ship, it’s the underlying library libcublas.so that isn’t being found. Do you have that somewhere on your system?

I am also trying my luck with Flux#master to get CuArrays updated and hopefully avoid out-of-memory problems. The error I’m seeing on

(v1.3) pkg> st Flux
    Status `~/.julia/environments/v1.3/Project.toml`
  [5ae59095] Colors v0.9.6
  [3a865a2d] CuArrays v1.4.7 # I've tried latest master as well
  [587475ba] Flux v0.9.0 [`~/.julia/dev/Flux`]

is

could not load library "libcudnn"
libcudnn.so: cannot open shared object file: No such file or directory
#dlopen#3(::Bool, ::typeof(Libdl.dlopen), ::String, ::UInt32) at Libdl.jl:109
dlopen at Libdl.jl:109 [inlined]
dlopen at Libdl.jl:109 [inlined]
(::CuArrays.CUDNN.var"#6478#lookup_fptr#99")() at call.jl:29
macro expansion at call.jl:37 [inlined]
macro expansion at error.jl:17 [inlined]
cudnnGetProperty(::CUDAapi.libraryPropertyType, ::Base.RefValue{Int32}) at libcudnn.jl:27
cudnnGetProperty at base.jl:9 [inlined]
version() at base.jl:13
#conv!#362(::Int64, ::Int64, ::typeof(conv!), ::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,1,Nothing}}, ::CuArrays.CuArray{Float32,4,Nothing}, ::DenseConvDims{2,(7, 1),1,15,(1, 1),(2, 2, 0, 0),(1, 1),false}) at nnlib.jl:46
conv!(::CuArrays.CuArray{Float32,4,Nothing}, ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,1,Nothing}}, ::CuArrays.CuArray{Float32,4,Nothing}, ::DenseConvDims{2,(7, 1),1,15,(1, 1),(2, 2, 0, 0),(1, 1),false}) at nnlib.jl:46
macro expansion at conv.jl:114 [inlined]
#conv#97(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(conv), ::CuArrays.CuArray{Float32,4,CuArrays.CuArray{Float32,1,Nothing}}, ::CuArrays.CuArray{Float32,4,Nothing}, ::DenseConvDims{2,(7, 1),1,15,(1, 1),(2, 2, 0, 0),(1, 1),false}) at 

This time, I have tried to restart the computer but the problem persists.

Edit:
I managed to hack my way around it by

sudo ln -s /usr/lib/x86_64-linux-gnu/libcudnn.so.7 /usr/lib/libcudnn

which maybe is not the best solution, but I couldn’t figure out a better way.

Can’t really find it (that doesn’t have to mean it’s not on my computer). Where should it be and if it’s missing is there a way to install in manually somehow?

/usr/lib/x86_64-linux-gnu should be on your LD_LIBRARY_PATH. If you do just using Libdl; Libdl.dlopen("libcudnn") it should work.

Also, you should have seen a warning when loading Flux that CUDNN (or worse, CUDA) isn’t available.

If you install the CUDA toolkit, it provides these libraries. Are you sure you have it installed?

1 Like

Thanks, I’ll set the path accordingly.

This warning did not appear

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]

julia> gpu(Conv((2,2), 1=>1, relu, dilation=2))(gpu(randn(10,10,1,1)))
ERROR: could not load library "libcudnn"
libcudnn.so: cannot open shared object file: No such file or directory

also, it worked well without me setting any LD_LIBRARY_PATH before I updated Flux to master.

You are using a “random” version of Flux – the warning is on master, and older versions shouldn’t have upgraded to the version of CuArrays you’re using (bounded by the package manager).

This is related as well, the check for CUDNN availability has changed, so your older version of Flux thinks it is available. It would have ‘worked’ before, but probably Flux would have decided not to use the CUDNN functionality at all (since it couldn’t find the library).

Well, there’s the problem. I don’t have CUDA and if I understand it correctly, I can’t even install it as I don’t have NVIDIA graphic card…

Does it mean that I can’t run Flux on my computer?

No, you are using a version of Flux that doesn’t correctly detect CUDA availability. Use a released version, or the latest master.

Thanks for the clarification. At least now I know what is the problem.

Now that the PR is merged, does the master branch work again for you? (I forgot that it hadn’t been merged yet)

Still doesn’t work. I tried it on Julia 1.2.0, 1.3rc4 and 1.3rc5. Same error.

Julia version doesn’t matter. Did you update packages? Could you give some details about the rest of your environment, maybe upload a Manifest? I just tried with a Docker image that doesn’t have CUDA, and everything works as expected:

$ docker run -it --rm julia                                                                                                                                                                                                     
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.2.0 (2019-08-20)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(v1.2) pkg> add Flux
 Installed CuArrays ───────────── v1.2.1
 Installed Flux ───────────────── v0.9.0
julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]

julia> Flux.has_cuarrays()
false

julia>  gpu(Conv((2,2), 1=>1, relu, dilation=2))(gpu(randn(10,10,1,1)))
Tracked 8×8×1×1 Array{Float64,4}:
(v1.2) pkg> add Flux#master
   Cloning git-repo `https://github.com/FluxML/Flux.jl.git`
  Updating git-repo `https://github.com/FluxML/Flux.jl.git`
 Installed CuArrays ───────────── v1.4.7

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)

julia> Flux.use_cuda[]
false

julia> gpu(Conv((2,2), 1=>1, relu, dilation=2))(gpu(randn(10,10,1,1)))
8×8×1×1 Array{Float64,4}: