Could not download forward compatibility package for CUDA

I am trying to run some CUDA calculations on a cluster and I have a problem with CUDA artifacts. Here is a simple test code, which I run on cluster node.

ENV["JULIA_CUDA_USE_BINARYBUILDER"] = false
using CUDA
CUDA.versioninfo()
println("")
a = CuArray([1,2])
print(a)

The code give me the output, which seems to be correct:

CUDA toolkit 11.6, local installation
NVIDIA driver 460.32.3, for CUDA 11.2
CUDA driver 11.2

Libraries: 
- CUBLAS: 11.9.2
- CURAND: 10.2.9
- CUFFT: 10.7.2
- CUSOLVER: 11.3.4
- CUSPARSE: 11.7.2
- CUPTI: 16.0.0
- NVML: 11.0.0+460.32.3
- CUDNN: 8.10.1 (for CUDA 11.2.0)
- CUTENSOR: missing

Toolchain:
- Julia: 1.6.7
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

Environment:
- JULIA_CUDA_USE_BINARYBUILDER: false

2 devices:
  0: Tesla K40m (sm_35, 11.918 GiB / 11.921 GiB available)
  1: Tesla K40m (sm_35, 11.918 GiB / 11.921 GiB available)

[1, 2]

But in addition I have some error messages in the debug output. Is this OK? What is wrong
with CUDA_compat? Can I get rid of this β€œDownloading artifact: CUDA_compat” messages?

 Downloading artifact: CUDA_compat
 Downloading artifact: CUDA_compat
β”Œ Debug: Could not download forward compatibility package
β”‚   exception =
β”‚    Unable to automatically install 'CUDA_compat' from '/home/.julia/packages/CUDA/DfvRa/Artifacts.toml'
β”‚    Stacktrace:
β”‚      [1] error(s::String)
β”‚        @ Base ./error.jl:33
β”‚      [2] ensure_artifact_installed(name::String, meta::Dict{String, Any}, artifacts_toml::String; platform::Base.BinaryPlatforms.Platform, verbose::Bool, quiet_download::Bool, io::Base.PipeEndpoint)
β”‚        @ Pkg.Artifacts /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Artifacts.jl:446
β”‚      [3] ensure_artifact_installed(name::String, artifacts_toml::String; platform::Base.BinaryPlatforms.Platform, pkg_uuid::Nothing, verbose::Bool, quiet_download::Bool, io::Base.PipeEndpoint)
β”‚        @ Pkg.Artifacts /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Artifacts.jl:409
β”‚      [4] _artifact_str(__module__::Module, artifacts_toml::String, name::SubString{String}, path_tail::String, artifact_dict::Dict{String, Any}, hash::Base.SHA1, platform::Base.BinaryPlatforms.Platform, lazyartifacts::Any)
β”‚        @ Artifacts /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Artifacts/src/Artifacts.jl:548
β”‚      [5] #invokelatest#2
β”‚        @ ./essentials.jl:708 [inlined]
β”‚      [6] invokelatest
β”‚        @ ./essentials.jl:706 [inlined]
β”‚      [7] macro expansion
β”‚        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Artifacts/src/Artifacts.jl:690 [inlined]
β”‚      [8] (::CUDA.var"#1#4")(id::String)
β”‚        @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/CUDAdrv.jl:97
β”‚      [9] find_libcuda()
β”‚        @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/CUDAdrv.jl:98
β”‚     [10] libcuda()
β”‚        @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/CUDAdrv.jl:141
β”‚     [11] macro expansion
β”‚        @ ~/.julia/packages/CUDA/DfvRa/deps/bindeps.jl:67 [inlined]
β”‚     [12] macro expansion
β”‚        @ lock.jl:209 [inlined]
β”‚     [13] toolkit()
β”‚        @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/bindeps.jl:21
β”‚     [14] macro expansion
β”‚        @ ~/.julia/packages/CUDA/DfvRa/deps/bindeps.jl:342 [inlined]
β”‚     [15] macro expansion
β”‚        @ lock.jl:209 [inlined]
β”‚     [16] libcudart()
β”‚        @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/bindeps.jl:21
β”‚     [17] macro expansion
β”‚        @ ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/error.jl:95 [inlined]
β”‚     [18] runtime_version()
β”‚        @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/version.jl:44
β”‚     [19] versioninfo(io::Base.PipeEndpoint) (repeats 2 times)
β”‚        @ CUDA ~/.julia/packages/CUDA/DfvRa/src/utilities.jl:32
β”‚     [20] top-level scope
β”‚        @ ~/julia/Test_CUDA/test_CUDA.jl:5
β”‚     [21] include(mod::Module, _path::String)
β”‚        @ Base ./Base.jl:384
β”‚     [22] exec_options(opts::Base.JLOptions)
β”‚        @ Base ./client.jl:285
β”‚     [23] _start()
β”‚        @ Base ./client.jl:485
β”” @ CUDA ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/CUDAdrv.jl:100
β”Œ Debug: Trying to use local installation...
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/bindeps.jl:210
β”Œ Debug: Looking for CUDA toolkit via environment variables CUDA_PATH, CUDA_HOME
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:271
β”Œ Debug: Looking for binary nvdisasm in /opt/software/nvidia/cuda-11.6u2
β”‚   all_locations =
β”‚    2-element Vector{String}:
β”‚     "/opt/software/nvidia/cuda-11.6u2"
β”‚     "/opt/software/nvidia/cuda-11.6u2/bin"
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:147
β”Œ Debug: Found nvdisasm at /opt/software/nvidia/cuda-11.6u2/bin/nvdisasm
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:153
β”Œ Debug: Looking for library cudart, no specific version, in /opt/software/nvidia/cuda-11.6u2
β”‚   all_names =
β”‚    1-element Vector{String}:
β”‚     "libcudart.so"
β”‚   all_locations =
β”‚    4-element Vector{String}:
β”‚     "/opt/software/nvidia/cuda-11.6u2"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib64"
β”‚     "/opt/software/nvidia/cuda-11.6u2/libx64"
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:119
β”Œ Debug: Found libcudart.so at /opt/software/nvidia/cuda-11.6u2/lib64
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:128
β”Œ Debug: Found local CUDA 11.6.0 at /opt/software/nvidia/cuda-11.6u2
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/bindeps.jl:232
β”Œ Debug: Looking for library cudart, no specific version, in /opt/software/nvidia/cuda-11.6u2
β”‚   all_names =
β”‚    1-element Vector{String}:
β”‚     "libcudart.so"
β”‚   all_locations =
β”‚    4-element Vector{String}:
β”‚     "/opt/software/nvidia/cuda-11.6u2"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib64"
β”‚     "/opt/software/nvidia/cuda-11.6u2/libx64"
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:119
β”Œ Debug: Found libcudart.so at /opt/software/nvidia/cuda-11.6u2/lib64
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:128
β”Œ Debug: Looking for library cublasLt, no specific version, in /opt/software/nvidia/cuda-11.6u2
β”‚   all_names =
β”‚    1-element Vector{String}:
β”‚     "libcublasLt.so"
β”‚   all_locations =
β”‚    4-element Vector{String}:
β”‚     "/opt/software/nvidia/cuda-11.6u2"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib64"
β”‚     "/opt/software/nvidia/cuda-11.6u2/libx64"
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:119
β”Œ Debug: Found libcublasLt.so at /opt/software/nvidia/cuda-11.6u2/lib64
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:128
β”Œ Debug: Looking for library cublas, no specific version, in /opt/software/nvidia/cuda-11.6u2
β”‚   all_names =
β”‚    1-element Vector{String}:
β”‚     "libcublas.so"
β”‚   all_locations =
β”‚    4-element Vector{String}:
β”‚     "/opt/software/nvidia/cuda-11.6u2"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib"
β”‚     "/opt/software/nvidia/cuda-11.6u2/lib64"
β”‚     "/opt/software/nvidia/cuda-11.6u2/libx64"
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:119
β”Œ Debug: Found libcublas.so at /opt/software/nvidia/cuda-11.6u2/lib64
β”” @ CUDA.Deps ~/.julia/packages/CUDA/DfvRa/deps/discovery.jl:128
β”Œ Debug:  cuBLAS (v11.5) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
β”‚   type: type=SOME TYPE; val=0
β”‚   value: type=int; val=POINTER (IN HEX:0x0x2b688951add0)
β”‚  Time: 2022-12-26T14:44:46 elapsed from start 0.016667 minutes or 1.000000 seconds
β”‚ Process=150087; Thread=47727540146624; GPU=0; Handle=POINTER (IN HEX:0x(nil))
β”‚  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
β”” @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/cublas/CUBLAS.jl:222
β”Œ Debug:  cuBLAS (v11.5) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
β”‚   type: type=SOME TYPE; val=1
β”‚   value: type=int; val=POINTER (IN HEX:0x0x2b6889c20c10)
β”‚  Time: 2022-12-26T14:44:46 elapsed from start 0.016667 minutes or 1.000000 seconds
β”‚ Process=150087; Thread=47727540146624; GPU=0; Handle=POINTER (IN HEX:0x(nil))
β”‚  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
β”” @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/cublas/CUBLAS.jl:222
β”Œ Debug:  cuBLAS (v11.5) function cublasStatus_t cublasGetProperty(libraryPropertyType, int*) called:
β”‚   type: type=SOME TYPE; val=2
β”‚   value: type=int; val=POINTER (IN HEX:0x0x2b6889c20c20)
β”‚  Time: 2022-12-26T14:44:46 elapsed from start 0.016667 minutes or 1.000000 seconds
β”‚ Process=150087; Thread=47727540146624; GPU=0; Handle=POINTER (IN HEX:0x(nil))
β”‚  COMPILED WITH: GNU GCC/G++ / 6.3.1 20170216 (Red Hat 6.3.1-3)
β”‚ 
1 Like

I am not too much into details, but I noticed this.

By having this enabled, I think it will want you to install/direct to your own CUDA artifact?

This is atleast what I get from reading the docs on CUDA

Kind regards

When you work with cluster nodes, usually you have to use CUDA drivers which already installed on the cluster. I have loaded it as modules.

module load nvidia/cuda/11.6u2
module load nvidia/cudnn/8.1.1

As far as I know from docs and forum messages in this case you have to switch off a JULIA_CUDA_USE_BINARYBUILDER and let to use locally built binaries. I can remove this line, but it will try to download all the CUDA packages, not only CUDA_compat.

Just if anybody need it: I have found a solution.
In file β€œ~/.julia/packages/CUDA/DfvRa/lib/cudadrv/CUDAdrv.jl” I found an β€œIF” operator which control the CUDA_compat downloading. Obviously, my CUDA toolkit 11.6 is a bit too old and CUDAdrv.jl try to download compatibility package.

    if getenv("JULIA_CUDA_USE_COMPAT", !hooked && !system_driver_loaded) && _system_version[] < v"11.7"

Solved it easily just switching off JULIA_CUDA_USE_COMPAT environment variable

export JULIA_CUDA_USE_BINARYBUILDER=false
export JULIA_CUDA_USE_COMPAT=false
1 Like

Why would you disable the attempt to load the forward-compatibility driver? The debug message mentioned that your system isn’t compatible with it, nothing more. It’s just a debug message.

User cluster nodes have GPU but have no an internet access. I can install julia packages using a login node but I have no possibility to download when I run code on cluster nodes with GPU.
As far as I found out in message boards - It is not so rare situation

In my case, an unsuccessful attempt to download CUDA_compat takes about 20 additional seconds of compilation time. Thus, I prefer to switch it off. I think that correct solution is just to ask the cluster admin to update CUDA driver, but it could take a while

That would be very surprising. Are you sure you’re not confounding the failed download of CUDA_Compat with the artifacts? The latter tries a bunch of time, for each CUDA version, so might take a while to fail all the way. Or is this a feature of your environment (e.g., by timing out connections instead of rejecting them)?

I believe that your last suggestion is correct. I have tried to start wget on the node with GPU. Here is the log and there was the very same delay - approximately 20 seconds.

$ wget https://google.com
--2023-01-05 11:51:18--  https://google.com/
Resolving google.com (google.com)... 108.177.14.101, 108.177.14.138, 108.177.14.139, ...
Connecting to google.com (google.com)|108.177.14.101|:443... failed: No route to host.
Connecting to google.com (google.com)|108.177.14.138|:443... failed: No route to host.
Connecting to google.com (google.com)|108.177.14.139|:443... failed: No route to host.
Connecting to google.com (google.com)|108.177.14.102|:443... failed: No route to host.
Connecting to google.com (google.com)|108.177.14.113|:443... failed: No route to host.
Connecting to google.com (google.com)|108.177.14.100|:443... failed: No route to host.
Connecting to google.com (google.com)|2a00:1450:4010:c0f::66|:443... failed: Network is unreachable.
Connecting to google.com (google.com)|2a00:1450:4010:c0f::8a|:443... failed: Network is unreachable.
Connecting to google.com (google.com)|2a00:1450:4010:c0f::8b|:443... failed: Network is unreachable.
Connecting to google.com (google.com)|2a00:1450:4010:c0f::65|:443... failed: Network is unreachable.