CuArrays/CUDAnative PSA: Simplified package loading

Hi all,

I’ve just tagged new versions of CuArrays/CUDAnative/… and among the usual slurry of features and bug fixes there’s a major change in how the packages are built and loaded. There used to be a step which could fail if you didn’t have a properly set-up CUDA GPU. This has made a lot of people very angry and been widely regarded as a bad move.

As an alternative, I have dropped this installation-time set-up and moved it to the precompilation phase, i.e., when you first import the package. As a result, you can now safely depend on CUDA packages since they won’t ever fail during installation. This is especially useful for clusters and containers, where you want to install packages in an environment that probably does not have a GPU.

Of course, loading the package might still fail if your user doesn’t have a CUDA GPU, so that’s why CUDAapi now provides a couple of useful functions to determine that:

using CUDAapi # this will NEVER fail
if has_cuda()
        using CuArrays # we have CUDA, so this should not fail
    catch ex
        # something is wrong with the user's set-up (or there's a bug in CuArrays)
        @warn "CUDA is installed, but CuArrays.jl fails to load" exception=(ex,catch_backtrace())

There’s also CUDAapi.has_cuda_gpu() to check if the user actually has a GPU.

As a result of all this, it should be possible to safely depend on any of the CUDA packages, without your users seeing errors because of not having a CUDA GPU. This is important, because it means we can use regular package version compatibility rules and don’t have to roll our own.


Two notes based on user feedback:

  1. if you see an error message LoadError: LoadError: UndefVarError: libcudnn not defined, this probably comes from Flux, which needs to be updated for the new version of CuArrays. Pin CuArrays for the time being. If the error comes from somewhere else, please file an issue.

  2. Loading might fail with Could not find library 'cublas': this library should be part of the CUDA toolkit, so we’ve become more strict about it being available (since it underpins lots of essential functionality in CuArrays). Please make sure your CUDA installation is OK, and provides libcublas. If it does, run the failing using CuArrays with JULIA_DEBUG=CUDAapi and create an issue with details about your system and the location of libcublas.


I’ve tried this out on a machine without CUDA, but it seems that precompilation fails (on Julia 1.2.0):

(optgpu) pkg> st
    Status `~/misc/optgpu/Project.toml`
  [3895d2a7] CUDAapi v1.2.0
  [3a865a2d] CuArrays v1.2.1

(optgpu) pkg> precompile
Precompiling project...
Precompiling CuArrays
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
ERROR: LoadError: Could not find CUDA driver library
 [1] error(::String) at ./error.jl:33
 [2] top-level scope at /Users/simon/.julia/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:33
 [3] include at ./boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1094
 [5] include(::Module, ::String) at ./Base.jl:31
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:330 [inlined]
 [8] eval(::Expr) at ./client.jl:432
 [9] top-level scope at ./none:3
in expression starting at /Users/simon/.julia/packages/CUDAdrv/ADRHQ/src/CUDAdrv.jl:27
ERROR: LoadError: Failed to precompile CUDAdrv [c5f51814-7f29-56b8-a69c-e4d8f6be1fde] to /Users/simon/.julia/compiled/v1.2/CUDAdrv/HMhfu.ji.
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1253
 [3] _require(::Base.PkgId) at ./loading.jl:1013
 [4] require(::Base.PkgId) at ./loading.jl:911
 [5] require(::Module, ::Symbol) at ./loading.jl:906
 [6] include at ./boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1094
 [8] include(::Module, ::String) at ./Base.jl:31
 [9] top-level scope at none:2
 [10] eval at ./boot.jl:330 [inlined]
 [11] eval(::Expr) at ./client.jl:432
 [12] top-level scope at ./none:3
in expression starting at /Users/simon/.julia/packages/CuArrays/wXQp8/src/CuArrays.jl:3

Yeah, CuArrays not being loadable (as shown in the OP, guarded by a call to CUDAapi.has_gpu()) implies that it isn’t precompilable either. Well, specifically CUDAdrv fails to load here, but that’s the same story. We do discovery of the CUDA toolkit (e.g. paths for libraries to ccall) in global scope, so you’d need to precompile again on the final node again anyway.