How to ignore 1 of the 2 GPUs on my machine

I’m trying to run a piece of code on a different machine than it was written on. On this new machine, there are two GPUs: a Teska K40c and an NVIDIA GeForce GTX 650. The problem seems to be with the second one. Running ] test CUDA gives the following error:

┌ Info: System information:
│ CUDA toolkit 11.4.1, artifact installation
│ CUDA driver 11.4.0
│ NVIDIA driver 470.57.2
│ 
│ Libraries: 
│ - CUBLAS: 11.5.4
│ - CURAND: 10.2.5
│ - CUFFT: 10.5.1
│ - CUSOLVER: 11.2.0
│ - CUSPARSE: 11.6.0
│ - CUPTI: 14.0.0
│ - NVML: 11.0.0+470.57.2
│ - CUDNN: 8.20.2 (for CUDA 11.4.0)
│ - CUTENSOR: 1.3.0 (for CUDA 11.2.0)
│ 
│ Toolchain:
│ - Julia: 1.6.2
│ - LLVM: 11.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
│ - Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
│ 
│ 2 devices:
│   0: Tesla K40c (sm_35, 11.107 GiB / 11.173 GiB available)
└   1: NVIDIA GeForce GTX 650 (sm_30, 900.750 MiB / 978.188 MiB available)
┌ Warning: Your NVIDIA GeForce GTX 650 GPU does not meet the minimal required compute capability (3.0.0 < 3.5).
│ Some functionality might be unavailable.
└ @ CUDA ~/.julia/packages/CUDA/9T5Sq/src/state.jl:237
ERROR: LoadError: BoundsError: attempt to access 1-element Vector{Any} at index [0:1]
Stacktrace:
 [1] throw_boundserror(A::Vector{Any}, I::Tuple{UnitRange{Int64}})
   @ Base ./abstractarray.jl:651
 [2] checkbounds
   @ ./abstractarray.jl:616 [inlined]
 [3] getindex(A::Vector{Any}, I::UnitRange{Int64})
   @ Base ./array.jl:807
 [4] top-level scope
   @ ~/.julia/packages/CUDA/9T5Sq/test/runtests.jl:158
 [5] include(fname::String)
   @ Base.MainInclude ./client.jl:444
 [6] top-level scope
   @ none:6
in expression starting at /home/jpereira/.julia/packages/CUDA/9T5Sq/test/runtests.jl:158
ERROR: Package CUDA errored during testing

Running the CUDA.jl introductory example gives a similar error:

julia> x_d = CUDA.fill(1.0f0, N);
julia> y_d = CUDA.fill(2.0f0, N);
julia> y_d .+= x_d
ERROR: Device capability v3.0.0 not supported by available toolchain

Altough I am not sure, it seems the issue is with the lack of support for compute capability 3.5 by the NVIDIA GeForce GTX 650 (according to https://developer.nvidia.com/cuda-gpus).

Is there a way for me to just ignore this GPU and use the Tesla K40c only?
Thank you

Select a device at the start of your session using device!, or use the CUDA_VISIBLE_DEVICES environment variable.

The bounds error is strange though, could you file an issue with more details?

I’ve tried setting the device!, but the error complaining about the lack of compute compatibility remains.
How can I set the CUDA_VISIBLE_DEVICES environment variable?

That depends on your platform and how you execute Julia.

On Linux you just do CUDA_VISIBLE_DEVICES=1 julia in your shell.

I’m working on Linux Ubuntu 18.04.
Since the Tesla K40 is on device number 0, I used CUDA_VISIBLE_DEVICES=0 julia. It now shows:

julia> CUDA.devices()
CUDA.DeviceIterator() for 1 devices:
0. Tesla K40c

So the NVIDIA GeForce GTX 650 is ignored. I was now able to successfully run the tests. Although they take forever:

                                          |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                             (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
initialization                        (2) |     7.70 |   0.00 |  0.0 |       0.00 |    62.88 |   0.22 |  2.8 |     548.43 |   835.81 |
gpuarrays/indexing scalar             (2) |    57.20 |   0.00 |  0.0 |       0.01 |    69.12 |   1.63 |  2.9 |    4499.45 |   835.81 |
gpuarrays/reductions/reducedim!       (2) |   189.36 |   0.01 |  0.0 |       1.03 |    70.12 |  10.21 |  5.4 |   18727.28 |   914.50 |

A follow-up question now is: how can I set this environment variable by default, on this machine?

That again depends on your environment. Check the documentation of your shell, you could e.g. add it to ~/.profile.

Better run Julia with --threads=auto, the CUDA.jl tests will then make use of all your cores, reducing the execution from e.g. 1h30 to 5min on my 32-core machine :slight_smile:

I’ve added the --threads=auto flag, and indeed it was dramatically faster. Is there a way to always use this flag (other than a bash alias)? (Should I always use this flag?)

I’ve marked the above answer as correct. Setting the CUDA_VISIBLE_DEVICES environment variable (in the .bash_profile, in my case) solves this particular issue.

There has been some errors on the tests, but I think that’s something I can deal/live with ahah
Thank you so much for your help, @maleadt !

As an aside, if anyone is running on a shared HPC system with multiple GPUS one common way to handle this is using groups.
The natch scheduler will create a group for you and assign memory / CPUs and GPU devices.
Your CUDA_VISIBLE_DEVICES variable will match the GPU which is ‘your GPU’