CUDA package having issues due to different versions of CUDA toolkit and NVIDIA driver CUDA version

I have CUDA toolkit for CUDA 13.0 and my drivers have CUDA 13.1 which is why I am getting this error in julia

Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
│  │ This is unexpected; please file an issue.

When I try to use local install of CUDA I get this error

┌ Error: CUDA.jl was precompiled without knowing the CUDA toolkit version. This is unsupported.
│ You should either precompile CUDA.jl in an environment where the CUDA toolkit is available,
│ or call `CUDA.set_runtime_version!` to specify which CUDA version to use.
└ @ CUDA C:\Users\Aditya\.julia\packages\CUDA\x8d2s\src\initialization.jl:139
1 Like

you may try resolve in pkg mode after clicking ].

(@v1.12) pkg> resolve

(@v1.12) pkg> ?resolve
Resolve the project i.e. run package resolution and update the Manifest. This is useful in case the dependencies of developed packages have changed causing the current Manifest to be out of sync.

1 Like

Resolve is not working

(@v1.12) pkg> resolve
     Project No packages added to or removed from `C:\Users\Aditya\.julia\environments\v1.12\Project.toml`
    Manifest No packages added to or removed from `C:\Users\Aditya\.julia\environments\v1.12\Manifest.toml`

Isn’t CUDA supposed to be backward compatible within a major version, that is a newer driver works on older code? Also I changed the tags, it was previously categorized with topics on Pkg, which is not directly responsible for this.

No
A GPU supporting new version of CUDA supports older version, but a software is written for a specific version of CUDA and usually does not work with any other version of CUDA.
More on forward compatibility Forward Compatibility — CUDA Compatibility

Can you still use CUDA.jl despite the error? I seem to remember that before resolving I couldn’t, but everything seems to work after:

julia> @usingany CUDA  # (or using in the proper environment; from ShareAdd.jl)
┌ Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
│ This is unexpected; please file an issue.
└ @ CUDA (...)\.julia\packages\CUDA\x8d2s\src\initialization.jl:148

julia> CUDA.rand(2, 3) * CUDA.rand(3, 4)
2×4 CuArray{Float32, 2, CUDA.DeviceMemory}:
 1.37045  1.31949  0.897917  1.30481
 1.32174  1.37154  0.495007  1.35085
Versions
julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b73 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto

julia> CUDA.versioninfo()
CUDA toolchain:
- runtime 13.1, artifact installation
- driver 591.59.0 for 13.1
- compiler 13.1

CUDA libraries:
- CUBLAS: 13.1.0
- CURAND: 10.4.0
- CUFFT: 12.0.0
- CUSOLVER: 12.0.4
- CUSPARSE: 12.6.3
- CUPTI: 2025.3.1 (API 13.0.1)
- NVML: 13.0.0+591.59

Julia packages:
- CUDA: 5.9.5
- CUDA_Driver_jll: 13.1.0+0
- CUDA_Compiler_jll: 0.3.0+1
- CUDA_Runtime_jll: 0.19.2+0

Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce RTX 3070 (sm_86, 6.173 GiB / 8.000 GiB available)

Yes, I can still use CUDA but not like this

using CUDA  # optional
device = gpu_device()  # function to move data and model to the GPU

This is a code from Flux.jl documentation and it returns False.

I can use CUDA like this

using CUDA

a = rand(1000)          # CPU array
b = CuArray(a)          # move to GPU

c = b .^ 2 .+ 1         # runs on GPU
result = Array(c)      # bring back to CPU

Moreover, after testing CUDA i got this

Testing finished in 40 minutes, 1 second, 826 milliseconds

Test Summary: |  Pass  Broken  Total  Time
  Overall     | 33900      27  33927
    SUCCESS
     Testing CUDA tests passed

Here is the full report

(@v1.12) pkg> test CUDA
     Testing CUDA
      Status `C:\Users\Aditya\AppData\Local\Temp\jl_Wlsbkx\Project.toml`
  [621f4979] AbstractFFTs v1.5.0
  [79e6a3ab] Adapt v4.4.0
  [ab4f0b2a] BFloat16s v0.6.0
  [052768ef] CUDA v5.9.5
  [d360d2e6] ChainRulesCore v1.26.0
  [864edb3b] DataStructures v0.19.3
  [7a1cc6ca] FFTW v1.10.0
  [0c68f7d7] GPUArrays v11.3.2
  [61eb1bfa] GPUCompiler v1.7.5
  [a98d9a8b] Interpolations v0.16.2
  [033835bb] JLD2 v0.6.3
  [63c18a36] KernelAbstractions v0.9.39
  [5da4648a] NVTX v1.0.1
  [a0a7dd2c] SparseMatricesCSR v0.6.9
  [276daf66] SpecialFunctions v2.6.1
  [90137ffa] StaticArrays v1.9.15
  [10745b16] Statistics v1.11.1
  [4ee394cb] CUDA_Driver_jll v13.1.0+0
  [76a88914] CUDA_Runtime_jll v0.19.2+0
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [37e2e46d] LinearAlgebra v1.12.0
  [44cfe95a] Pkg v1.12.1
  [de0858da] Printf v1.11.0
  [3fa0cd96] REPL v1.11.0
  [9a3f8284] Random v1.11.0
  [2f01184e] SparseArrays v1.12.0
  [8dfed614] Test v1.11.0
      Status `C:\Users\Aditya\AppData\Local\Temp\jl_Wlsbkx\Manifest.toml`
  [621f4979] AbstractFFTs v1.5.0
  [79e6a3ab] Adapt v4.4.0
  [a9b6321e] Atomix v1.1.2
  [13072b0f] AxisAlgorithms v1.1.0
  [ab4f0b2a] BFloat16s v0.6.0
  [fa961155] CEnum v0.5.0
  [052768ef] CUDA v5.9.5
  [1af6417a] CUDA_Runtime_Discovery v1.0.0
  [d360d2e6] ChainRulesCore v1.26.0
  [0b6fb165] ChunkCodecCore v1.0.1
  [4c0bbee4] ChunkCodecLibZlib v1.0.0
  [55437552] ChunkCodecLibZstd v1.0.0
  [3da002f7] ColorTypes v0.12.1
  [5ae59095] Colors v0.13.1
  [34da2185] Compat v4.18.1
  [a8cc5b0e] Crayons v4.1.1
  [9a962f9c] DataAPI v1.16.0
  [a93c6f00] DataFrames v1.8.1
  [864edb3b] DataStructures v0.19.3
  [e2d170a0] DataValueInterfaces v1.0.0
  [ffbed154] DocStringExtensions v0.9.5
  [e2ba6199] ExprTools v0.1.10
  [7a1cc6ca] FFTW v1.10.0
  [5789e2e9] FileIO v1.17.1
  [53c48c17] FixedPointNumbers v0.8.5
  [0c68f7d7] GPUArrays v11.3.2
  [46192b85] GPUArraysCore v0.2.0
  [61eb1bfa] GPUCompiler v1.7.5
  [096a3bc2] GPUToolbox v1.0.0
  [076d061b] HashArrayMappedTries v0.2.0
  [842dd82b] InlineStrings v1.4.5
  [a98d9a8b] Interpolations v0.16.2
  [41ab1584] InvertedIndices v1.3.1
  [92d709cd] IrrationalConstants v0.2.6
  [82899510] IteratorInterfaceExtensions v1.0.0
  [033835bb] JLD2 v0.6.3
  [692b3bcd] JLLWrappers v1.7.1
  [63c18a36] KernelAbstractions v0.9.39
  [929cbde3] LLVM v9.4.4
  [8b046642] LLVMLoopInfo v1.0.0
  [b964fa9f] LaTeXStrings v1.4.0
  [2ab3a3ac] LogExpFunctions v0.3.29
  [1914dd2f] MacroTools v0.5.16
  [e1d29d7a] Missings v1.2.0
  [5da4648a] NVTX v1.0.1
  [6fe1bfb0] OffsetArrays v1.17.0
  [bac558e1] OrderedCollections v1.8.1
  [2dfb63ee] PooledArrays v1.4.3
  [aea7be01] PrecompileTools v1.3.3
  [21216c6a] Preferences v1.5.1
  [08abe8d2] PrettyTables v3.1.2
  [74087812] Random123 v1.7.1
  [e6cf234a] RandomNumbers v1.6.0
  [c84ed2f1] Ratios v0.4.5
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.1
  [7e506255] ScopedValues v1.5.0
  [6c6a2e73] Scratch v1.3.0
  [91c51154] SentinelArrays v1.4.9
  [a2af1166] SortingAlgorithms v1.2.2
  [a0a7dd2c] SparseMatricesCSR v0.6.9
  [276daf66] SpecialFunctions v2.6.1
  [90137ffa] StaticArrays v1.9.15
  [1e83bf80] StaticArraysCore v1.4.4
  [10745b16] Statistics v1.11.1
  [892a3eda] StringManipulation v0.4.2
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.12.1
  [e689c965] Tracy v0.1.6
  [013be700] UnsafeAtomics v0.3.0
  [efce3f68] WoodburyMatrices v1.0.0
  [d1e2174e] CUDA_Compiler_jll v0.3.0+1
  [4ee394cb] CUDA_Driver_jll v13.1.0+0
  [76a88914] CUDA_Runtime_jll v0.19.2+0
  [f5851436] FFTW_jll v3.3.11+0
  [1d5cc7b8] IntelOpenMP_jll v2025.2.0+0
  [9c1d0b0a] JuliaNVTXCallbacks_jll v0.2.1+0
  [dad2f222] LLVMExtra_jll v0.0.38+0
  [ad6e5548] LibTracyClient_jll v0.9.1+6
  [856f044c] MKL_jll v2025.2.0+0
  [e98f9f5b] NVTX_jll v3.2.2+0
  [efe28fd5] OpenSpecFun_jll v0.5.6+0
  [3161d3a3] Zstd_jll v1.5.7+1
  [1e29f10c] demumble_jll v1.3.0+0
  [1317d2d5] oneTBB_jll v2022.0.0+1
  [0dad84c5] ArgTools v1.1.2
  [56f22d72] Artifacts v1.11.0
  [2a0f44e3] Base64 v1.11.0
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [f43a241f] Downloads v1.7.0
  [7b1f6079] FileWatching v1.11.0
  [9fa8497b] Future v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [ac6e5ff7] JuliaSyntaxHighlighting v1.12.0
  [4af54fe1] LazyArtifacts v1.11.0
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2 v1.11.0
  [8f399da3] Libdl v1.11.0
  [37e2e46d] LinearAlgebra v1.12.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [a63ad114] Mmap v1.11.0
  [ca575930] NetworkOptions v1.3.0
  [44cfe95a] Pkg v1.12.1
  [de0858da] Printf v1.11.0
  [3fa0cd96] REPL v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [1a1011a3] SharedArrays v1.11.0
  [6462fe0b] Sockets v1.11.0
  [2f01184e] SparseArrays v1.12.0
  [f489334b] StyledStrings v1.11.0
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test v1.11.0
  [cf7118a7] UUIDs v1.11.0
  [4ec0a83e] Unicode v1.11.0
  [e66e0078] CompilerSupportLibraries_jll v1.3.0+1
  [deac9b47] LibCURL_jll v8.15.0+0
  [e37daf67] LibGit2_jll v1.9.0+0
  [29816b5a] LibSSH2_jll v1.11.3+1
  [14a3606d] MozillaCACerts_jll v2025.5.20
  [4536629a] OpenBLAS_jll v0.3.29+0
  [05823500] OpenLibm_jll v0.8.7+0
  [458c3c95] OpenSSL_jll v3.5.4+0
  [bea87d4a] SuiteSparse_jll v7.8.3+2
  [83775a58] Zlib_jll v1.3.1+2
  [8e850b90] libblastrampoline_jll v5.15.0+0
  [8e850ede] nghttp2_jll v1.64.0+1
  [3f19e933] p7zip_jll v17.7.0+0
     Testing Running tests...
┌ Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
│ This is unexpected; please file an issue.
└ @ CUDA C:\Users\Aditya\.julia\packages\CUDA\x8d2s\src\initialization.jl:148
┌ Info: System information:
│ CUDA toolchain:
│ - runtime 13.1, artifact installation
│ - driver 591.44.0 for 13.1
│ - compiler 13.1
│
│ CUDA libraries:
│ - CUBLAS: 13.1.0
│ - CURAND: 10.4.0
│ - CUFFT: 12.0.0
│ - CUSOLVER: 12.0.4
│ - CUSPARSE: 12.6.3
│ - CUPTI: 2025.3.1 (API 13.0.1)
│ - NVML: 13.0.0+591.44
│
│ Julia packages:
│ - CUDA: 5.9.5
│ - CUDA_Driver_jll: 13.1.0+0
│ - CUDA_Compiler_jll: 0.3.0+1
│ - CUDA_Runtime_jll: 0.19.2+0
│
│ Toolchain:
│ - Julia: 1.12.3
│ - LLVM: 18.1.7
│
│ 1 device:
└   0: NVIDIA GeForce RTX 4050 Laptop GPU (sm_89, 5.340 GiB / 5.997 GiB available)
[ Info: Testing using device 0 (NVIDIA GeForce RTX 4050 Laptop GPU). To change this, specify the `--gpu` argument to the tests, or set the `CUDA_VISIBLE_DEVICES` environment variable.
[ Info: Running 2 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
      From worker 2:    ┌ Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
      From worker 2:    │ This is unexpected; please file an issue.
      From worker 2:    └ @ CUDA C:\Users\Aditya\.julia\packages\CUDA\x8d2s\src\initialization.jl:148
      From worker 3:    ┌ Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
      From worker 3:    │ This is unexpected; please file an issue.
      From worker 3:    └ @ CUDA C:\Users\Aditya\.julia\packages\CUDA\x8d2s\src\initialization.jl:148
                                                  |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
core/initialization                           (2) |    17.48 |   1.39 |  7.9 |       0.00 |      N/A |   0.00 |  0.0 |      86.45 |  1219.08 |
      From worker 4:    ┌ Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
      From worker 4:    │ This is unexpected; please file an issue.
      From worker 4:    └ @ CUDA C:\Users\Aditya\.julia\packages\CUDA\x8d2s\src\initialization.jl:148
gpuarrays/reductions/sum prod                 (3) |   141.57 |   0.02 |  0.0 |       3.23 |      N/A |   2.86 |  2.0 |   12493.46 |  2879.59 |
gpuarrays/reductions/reduce                   (4) |   109.33 |   0.02 |  0.0 |       1.53 |      N/A |   2.37 |  2.2 |   10297.67 |  1900.52 |
gpuarrays/reductions/mapreducedim!            (3) |    80.19 |   0.01 |  0.0 |       1.54 |      N/A |   1.20 |  1.5 |    5900.76 |  3526.55 |
gpuarrays/reductions/== isequal               (3) |    51.65 |   0.01 |  0.0 |       1.07 |      N/A |   1.02 |  2.0 |    5121.06 |  4014.48 |
gpuarrays/broadcasting                        (4) |   148.60 |   0.02 |  0.0 |       2.00 |      N/A |   3.02 |  2.0 |   12539.38 |  3385.03 |
gpuarrays/base                                (3) |    34.57 |   0.00 |  0.0 |       8.90 |      N/A |   1.13 |  3.3 |    2875.04 |  4531.65 |
gpuarrays/vectors                             (3) |     0.43 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      21.62 |  4531.65 |
gpuarrays/random                              (4) |    13.92 |   0.01 |  0.0 |     392.05 |      N/A |   0.31 |  2.2 |    1373.38 |  3689.95 |
gpuarrays/ext/jld2                            (3) |    16.77 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     523.58 |  4576.77 |
gpuarrays/constructors                        (4) |    21.80 |   0.01 |  0.0 |       0.65 |      N/A |   0.27 |  1.2 |    1390.27 |  3853.25 |
gpuarrays/statistics                          (4) |    51.43 |   0.00 |  0.0 |       1.50 |      N/A |   0.98 |  1.9 |    4735.85 |  4653.82 |
gpuarrays/reductions/mapreduce                (3) |    70.92 |   0.01 |  0.0 |       1.83 |      N/A |   1.04 |  1.5 |    5731.24 |  5007.89 |
gpuarrays/linalg/NaN_false                    (3) |    14.77 |   0.00 |  0.0 |       0.00 |      N/A |   0.12 |  0.8 |     968.52 |  5593.66 |
gpuarrays/math/intrinsics                     (3) |     1.91 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     158.35 |  5593.66 |
gpuarrays/linalg/mul!/matrix-matrix           (3) |    61.20 |   0.02 |  0.0 |       0.13 |      N/A |   0.91 |  1.5 |    5332.84 |  6213.73 |
gpuarrays/sparse                              (3) |     0.00 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       0.05 |  6218.92 |
gpuarrays/reductions/mapreducedim!_large      (3) |    10.02 |   0.00 |  0.0 |     818.34 |      N/A |   0.47 |  4.7 |    2036.99 |  6597.79 |
gpuarrays/uniformscaling                      (3) |     8.05 |   0.00 |  0.0 |       0.01 |      N/A |   0.00 |  0.0 |     447.28 |  6597.79 |
gpuarrays/linalg/norm                         (4) |   139.38 |   0.01 |  0.0 |       0.02 |      N/A |   2.51 |  1.8 |   11376.25 |  7053.47 |
gpuarrays/reductions/any all count            (4) |    10.29 |   0.00 |  0.0 |       0.00 |      N/A |   0.12 |  1.2 |    1008.55 |  7363.41 |
gpuarrays/linalg/kron                         (4) |    82.23 |   0.01 |  0.0 |    5381.25 |      N/A |   7.29 |  8.9 |   30032.35 |  8386.31 |
gpuarrays/reductions/minimum maximum extrema  (3) |   146.98 |   0.02 |  0.0 |       2.19 |      N/A |   2.73 |  1.9 |   13187.27 |  8341.30 |
gpuarrays/indexing multidimensional           (4) |    47.05 |   0.00 |  0.0 |       2.07 |      N/A |   0.52 |  1.1 |    3084.54 |  8528.57 |
gpuarrays/indexing find                       (3) |    26.03 |   0.00 |  0.0 |       0.13 |      N/A |   0.49 |  1.9 |    2367.11 |  8774.89 |
gpuarrays/linalg/diagonal                     (4) |    20.20 |   0.00 |  0.0 |       2.34 |      N/A |   0.23 |  1.1 |    1546.28 |  9074.04 |
gpuarrays/linalg/mul!/vector-matrix           (3) |    31.45 |   0.01 |  0.0 |       0.02 |      N/A |   0.36 |  1.2 |    2458.67 |  9066.32 |
gpuarrays/math/power                          (4) |    29.93 |   0.00 |  0.0 |       0.01 |      N/A |   0.42 |  1.4 |    2680.76 |  9202.92 |
gpuarrays/reductions/reducedim!               (4) |    50.92 |   0.00 |  0.0 |       1.03 |      N/A |   0.64 |  1.3 |    4036.16 |  9507.64 |
gpuarrays/indexing scalar                     (4) |     8.29 |   0.00 |  0.0 |       0.01 |      N/A |   0.00 |  0.0 |     576.67 |  9585.64 |
gpuarrays/alloc cache                         (4) |     1.37 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     117.44 |  9586.75 |
gpuarrays/linalg/core                         (3) |   106.20 |   0.01 |  0.0 |      26.04 |      N/A |   2.81 |  2.6 |    9030.51 |  9672.86 |
libraries/cusparse                            (4) |   106.90 |   0.05 |  0.0 |      23.36 |      N/A |   1.21 |  1.1 |    6901.08 | 10227.49 |
base/array                                    (4) |    62.75 |   0.11 |  0.2 |    1264.58 |      N/A |   1.04 |  1.7 |    6671.73 | 10421.44 |
      From worker 4:    WARNING: Method definition var"#296"(Any) in module Main at C:\Users\Aditya\.julia\packages\CUDA\x8d2s\test\core\execution.jl:358 overwritten at C:\Users\Aditya\.julia\packages\CUDA\x8d2s\test\core\execution.jl:366.
core/execution                                (4) |    31.87 |   0.00 |  0.0 |       0.02 |      N/A |   0.38 |  1.2 |    2510.28 | 10421.44 |
libraries/cublas/extensions                   (4) |    45.45 |   0.02 |  0.0 |      36.69 |      N/A |   0.45 |  1.0 |    3259.62 | 10421.44 |
libraries/cusolver/dense                      (3) |   230.00 |   0.11 |  0.0 |     329.71 |      N/A |   6.35 |  2.8 |   17572.60 | 10378.38 |
core/cudadrv                                  (4) |     7.67 |   5.01 | 65.3 |       0.00 |      N/A |   0.00 |  0.0 |     286.12 | 10421.44 |
      From worker 5:    ┌ Error: You are using CUDA 13.1.0, but CUDA.jl was precompiled for CUDA 13.0.0.
      From worker 5:    │ This is unexpected; please file an issue.
      From worker 5:    └ @ CUDA C:\Users\Aditya\.julia\packages\CUDA\x8d2s\src\initialization.jl:148
libraries/cublas/level2                       (3) |    28.50 |   0.01 |  0.0 |       1.35 |      N/A |   0.39 |  1.4 |    2023.44 | 10378.38 |
libraries/cublas/level3                       (3) |    30.96 |   0.02 |  0.1 |       5.74 |      N/A |   0.86 |  2.8 |    4395.59 | 10378.38 |
libraries/cublas/level3/gemm                  (5) |    63.82 |   0.02 |  0.0 |       8.95 |      N/A |   1.23 |  1.9 |    5417.15 |  2084.90 |
libraries/cublas/xt                           (5) |    14.97 |   0.00 |  0.0 |       0.98 |      N/A |   0.15 |  1.0 |     979.77 |  2377.58 |
libraries/cusparse/generic                    (3) |    59.02 |   0.06 |  0.1 |      14.28 |      N/A |   0.69 |  1.2 |    3813.72 | 10396.62 |
core/device/intrinsics/wmma                   (3) |    65.32 |   0.01 |  0.0 |       0.63 |      N/A |   0.94 |  1.4 |    4837.66 | 11261.78 |
base/sorting                                  (5) |    81.59 |   0.01 |  0.0 |     668.44 |      N/A |   4.39 |  5.4 |   13006.34 |  5194.62 |
libraries/cufft                               (5) |    72.74 |   0.01 |  0.0 |     198.03 |      N/A |   1.18 |  1.6 |    6732.35 |  5194.62 |
core/device/intrinsics/atomics                (5) |    18.96 |   0.00 |  0.0 |       0.00 |      N/A |   0.13 |  0.7 |     992.26 |  5194.62 |
libraries/cusparse/conversions                (5) |    23.95 |   0.01 |  0.0 |       1.73 |      N/A |   0.37 |  1.6 |    2529.96 |  5194.62 |
libraries/cusolver/dense_generic              (5) |    40.67 |   0.01 |  0.0 |      55.05 |      N/A |   0.69 |  1.7 |    4201.50 |  5194.62 |
libraries/cusparse/interfaces                 (3) |   167.86 |   0.16 |  0.1 |      47.33 |      N/A |   1.69 |  1.0 |    8230.59 | 11832.97 |
core/device/intrinsics/cooperative_groups     (3) |    78.63 |   0.02 |  0.0 |      19.19 |      N/A |   0.82 |  1.0 |    2156.64 | 13298.51 |
base/texture                                  (5) |    87.17 |   0.00 |  0.0 |       0.10 |      N/A |   0.99 |  1.1 |    3681.69 |  5194.62 |
core/device/intrinsics                        (3) |    34.64 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     689.94 | 13453.25 |
libraries/cublas/level1                       (5) |    62.89 |   0.01 |  0.0 |       0.03 |      N/A |   0.47 |  0.7 |    1867.28 |  5416.89 |
core/device/array                             (5) |     9.76 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     283.67 |  5416.89 |
libraries/cusparse/bmm                        (3) |    66.85 |   0.02 |  0.0 |       0.99 |      N/A |   1.11 |  1.7 |    3454.02 | 14100.95 |
libraries/cusolver/sparse                     (3) |    20.82 |   0.00 |  0.0 |       0.22 |      N/A |   0.00 |  0.0 |     429.53 | 14100.95 |
base/random                                   (5) |    63.72 |   3.50 |  5.5 |    4352.59 |      N/A |   0.44 |  0.7 |    1743.34 |  6719.29 |
core/codegen                                  (5) |     7.46 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     164.26 |  6826.09 |
core/device/intrinsics/memory                 (3) |    20.70 |   0.00 |  0.0 |       0.02 |      N/A |   0.00 |  0.0 |     410.45 | 14206.75 |
core/device/intrinsics/output                 (3) |    24.66 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     747.03 | 14264.47 |
core/device/intrinsics/math                   (5) |    49.98 |   0.01 |  0.0 |       0.00 |      N/A |   0.37 |  0.7 |    2074.52 |  7532.01 |
libraries/cusparse/device                     (5) |     1.70 |   0.00 |  0.0 |       0.01 |      N/A |   0.00 |  0.0 |     115.58 |  7532.01 |
libraries/cusolver/multigpu                   (5) |    13.07 |   0.00 |  0.0 |     545.60 |      N/A |   0.00 |  0.0 |     519.90 |  7532.01 |
core/device/random                            (3) |    39.88 |   0.01 |  0.0 |       0.37 |      N/A |   0.83 |  2.1 |    2476.74 | 14413.74 |
core/device/ldg                               (5) |     6.53 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     588.13 |  7532.01 |
libraries/cusolver/base                       (5) |     0.08 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       1.78 |  7532.01 |
core/pointer                                  (5) |     0.42 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      14.34 |  7532.01 |
base/broadcast                                (5) |    12.81 |   0.02 |  0.2 |       0.00 |      N/A |   0.14 |  1.1 |    1130.10 |  7532.01 |
core/nvml                                     (5) |     0.74 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      60.23 |  7532.01 |
libraries/cusparse/broadcast                  (3) |    61.63 |   0.01 |  0.0 |       0.13 |      N/A |   1.00 |  1.6 |    5687.32 | 14815.96 |
libraries/cusparse/linalg                     (5) |    44.19 |   0.04 |  0.1 |       6.78 |      N/A |   0.77 |  1.7 |    5339.95 |  7532.01 |
libraries/cusolver/sparse_factorizations      (5) |     8.42 |   0.00 |  0.0 |       2.29 |      N/A |   0.13 |  1.6 |    1187.93 |  7532.01 |
base/exceptions                               (3) |   258.46 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       8.06 | 14815.96 |
base/iterator                                 (3) |     2.53 |   0.00 |  0.0 |       1.93 |      N/A |   0.00 |  0.0 |     220.84 | 14815.96 |
base/threading                                (3) |     1.56 |   0.01 |  0.3 |      10.94 |      N/A |   0.00 |  0.0 |      96.83 | 14815.96 |
core/utils                                    (3) |     0.71 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      44.09 | 14815.96 |
core/pool                                     (3) |     7.32 |   0.00 |  0.0 |       0.00 |      N/A |   5.10 | 69.7 |     176.90 | 14815.96 |
libraries/cusparse/sparse_matrices_csr        (3) |     2.31 |   0.00 |  0.1 |       1.48 |      N/A |   0.00 |  0.0 |      80.50 | 14815.96 |
base/linalg                                   (3) |    43.30 |   0.01 |  0.0 |    1554.63 |      N/A |   3.07 |  7.1 |    7640.67 | 16330.24 |
libraries/cusparse/reduce                     (3) |    16.34 |   0.01 |  0.0 |       0.06 |      N/A |   0.00 |  0.0 |    1404.97 | 16330.24 |
libraries/staticarrays                        (3) |     1.24 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |     150.89 | 16330.24 |
base/kernelabstractions                       (3) |    65.84 |   0.08 |  0.1 |      71.03 |      N/A |   0.89 |  1.4 |    4243.13 | 16801.58 |
base/examples                                 (3) |     6.41 |   0.00 |  0.0 |     385.30 |      N/A |   0.10 |  1.6 |    1189.81 | 16801.58 |
libraries/curand                              (3) |     0.11 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       1.89 | 16801.58 |
core/profile                                  (5) |   568.17 |   0.00 |  0.0 |       0.00 |      N/A |   3.13 |  0.6 |    9767.90 |  7532.01 |
Testing finished in 40 minutes, 1 second, 826 milliseconds

Test Summary: |  Pass  Broken  Total  Time
  Overall     | 33900      27  33927
    SUCCESS
     Testing CUDA tests passed

Moreover, I got to know that we don’t need CUDA toolkit for PyTorch CUDA, so I uninstalled it and I am still getting the same error.
I have tried removing .julia from my C drive and reinstalling julia but the problem remains…