How to add CUDA as optional dependency?

Suppose I have a package and that I am interested in providing an option to end users to speed up their calculations in the presence of a NVIDIA GPU. Not everyone will benefit from this option and so I can’t add CUDA as a dependency. Is there a solution specifically designed for the CUDA.jl stack that allows package developers to add the stack as an optional dependency?

Note: I am aware of Requires.jl but I can’t afford the lack of support for semantic versioning.

1 Like

This is probably not what you are looking for but in principle you could split up your code into three parts: general, GPU-specific, and CPU-specific.

Then users could do something like:

using GeneralPackage, GPUPackage

result = GeneralPackage.calculate(inputs, GPUPackage.Engine())

or conversely

using GeneralPackage, CPUPackage

result = GeneralPackage.calculate(inputs, CPUPackage.Engine())

Thank you @josuagrw , I will implement the proposed solution if there is no alternative.

1 Like

Currently, Pkg.jl does not support conditional provisioning.

Is it a problem to include CUDA as a dependency?
If there is no GPU available, you can then decide what to do.
That is also mentioned in the CUDA.jl docs.

1 Like

You can use Requires.jl for this purpose, eg:

But that’s a very limited scope solution, unfortunately.

There is a simple solution to this, which has been adopted by many packages already:

  1. add CUDA.jl as a normal dependency
  2. do using CUDA in your package
  3. check at runtime if CUDA is functional or not (with CUDA.functional()) and use the GPU in function of this.

The installation of CUDA.jl as well as using CUDA and compiling CUDA.jl code will work without error on a system without GPU. Only at runtime, trying to use the GPU for real, it would fail. But, this you will avoid by not doing so if CUDA.functional() returns false.

Example:

  • here is how it is done in ImplicitGlobalGrid
3 Likes

Any system that runs Julia will successfully build CUDA.jl? How often the build fails on different OSes, platforms, etc? Ideally I would like to avoid an extra dependency, specially if it has some potential to reduce the number places where a package can run.

We put a lot of effort in making sure that CUDA.jl can be loaded, even if CUDA is not available. You can use CUDA.functional() to check if your system has CUDA available.

julia> using CUDA

julia> CUDA.functional()
false
2 Likes

CUDA.jl does nothing before its first use. Even artifacts are downloaded at run time. So the only ‘cost’ is additional dependencies, and package load time. The latter can still be optimized, so if somebody’s familiar with the recent techniques for improving latency (optimizing precompilation, avoiding invalidations) that would be a valuable contribution.

3 Likes

yeah, the question is only do you want to pull a few GB of CUDA.jl for a small package when the user may not even have CUDA enabled gpu… :wink:

3 Likes

Yes, that is a big concern I have as well. Downloading a bunch of binaries and packages always seems too heavy, specially if you take into account countries under development with poor internet connection. Even though CUDA.functional() is wonderful (thanks for sharing) it still doesn’ t fully solve the issue from a software engineering perspective.

Maybe someday in the future we could have a julia --gpu flag or something cooked into the language like we do for threads and distributed computing? :slight_smile: I am just wondering if that would make sense at all. No need to install CUDA, OpenCL, … dependencies manually, the language itself would have GPU support baked in.

Where are those gigabytes going to come from?

Just make sure you also call CUDA.functional() lazily. You shouldn’t do that during __init__(), and definitely not globally.

For an example, look, at Flux.jl: Flux.jl/functor.jl at e1268a6467ed78e1009b60972a1b9195525a0e0f · FluxML/Flux.jl · GitHub

1 Like

@maleadt can you comment on this current list of dependencies in CUDA.jl’s Project.toml?

[deps]
AbstractFFTs = "621f4979-c628-5d54-868e-fcf4e3e8185c"
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
BFloat16s = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
CEnum = "fa961155-64e5-5f13-b03f-caf6b980ea82"
CompilerSupportLibraries_jll = "e66e0078-7015-5450-92f7-15fbd957f2ae"
ExprTools = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
GPUArrays = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
GPUCompiler = "61eb1bfa-7361-4325-ad38-22787b887f55"
LLVM = "929cbde3-209d-540e-8aea-75f648917ca0"
LazyArtifacts = "4af54fe1-eca0-43a8-85a7-787d91b784e3"
Libdl = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Random123 = "74087812-796a-5b5d-8853-05524746bad3"
RandomNumbers = "e6cf234a-135c-5ec9-84dd-332b85af5143"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
TimerOutputs = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"

From heaviest to lightest, which ones could potentially introduce non-trivial issues during installation? I see a lot of compiler packages, but I don’t understand how challenging it is to install these packages in old machines (thinking of students in countries under development).

No, most of those packages are fairly small. We’ve taken care to isolate larger dependencies in separate packages (like NNlibCUDA.jl) or by using Requires.jl
There’s a couple of packages that require binary dependencies, but those work well enough nowadays to not be a problem.

1 Like

from the fact that one of the binary libraries may link against CUDA_jll.

SCS is a tiny (100sKB) C-library; we ship separate SCS_GPU_jll which has CUDA_full_jll as build dep only. The aim is to add the gpu functionality to the package only conditionally, when CUDA_full is already present.

Or maybe is there a better way of doing this?

You mean CUDA_jll? CUDA_full_jll should never be present, as you mention it’s a build dep only.

That said, currently the way CUDA.jl handles its CUDA toolkit dependency is entirely separate from CUDA_jll.jl. We hope to unify that in the future, but as of now you can’t have both loaded at the same time.

1 Like

We use CUDA_full_jll as build dep and require users to using CUDA_jll before using SCS;
This way we can define gpu-related functionality in SCS.jl through Requires and only include it when CUDA is present. This e.g. defines some constants in the package.

However, maybe it’s time to revisit the solution. What do you suggest?

That’s the correct set-up indeed, but you mentioned CUDA_full twice in your previous post, so I assume you meant CUDA_jll. This solution doesn’t integrate with CUDA.jl yet, but I hope to take another look at this in the new year.

1 Like

Adding another dimension to the question, I wonder what is the future of

It seemed like an interesting idea to ask for computational resources explicitly in user scripts. If the language had this notion of resource built-in we could switch between different resources without depending on packages I guess? Just a set of abstract types defining the resources and a convention to implement all Base methods with resources as the first argument when possible. Falling back to the CPU.

1 Like