CUDA errors 203 and 103, CUDA.rand() and CUDA.versioninfo()

I get the below error when attempting to use CUDa.rand()

julia> using CUDA
julia> CUDA.rand()
ERROR: CURANDError: initialization of CUDA failed (code 203, CURAND_STATUS_INITIALIZATION_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.CURAND.curandStatus)
@ CUDA.CURAND C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\error.jl:56
[2] macro expansion
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\error.jl:69 [inlined]
[3] curandCreateGenerator(typ::CUDA.CURAND.curandRngType)
@ CUDA.CURAND C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\wrappers.jl:5
[4] CUDA.CURAND.RNG(typ::CUDA.CURAND.curandRngType; stream::CuStream)
@ CUDA.CURAND C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\random.jl:13
[5] RNG (repeats 2 times)
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\random.jl:13 [inlined]
[6] #167
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\CURAND.jl:38 [inlined]
[7] (::CUDA.APIUtils.var"#8#11"{CUDA.CURAND.var"#167#173", CUDA.APIUtils.HandleCache{CuContext, CUDA.CURAND.RNG}, CuContext})()
@ CUDA.APIUtils C:\Users\DIi.julia\packages\CUDA\BbliS\lib\utils\cache.jl:22
[8] lock(f::CUDA.APIUtils.var"#8#11"{CUDA.CURAND.var"#167#173", CUDA.APIUtils.HandleCache{CuContext, CUDA.CURAND.RNG}, CuContext}, l::ReentrantLock)
@ Base .\lock.jl:187
[9] check_cache
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\utils\cache.jl:20 [inlined]
[10] pop!
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\utils\cache.jl:41 [inlined]
[11] (::CUDA.CURAND.var"#new_state#172")(cuda::NamedTuple{(:device, :context, :stream, :math_mode, :math_precision), Tuple{CuDevice, CuContext, CuStream, CUDA.MathMode, Symbol}})
@ CUDA.CURAND C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\CURAND.jl:37
[12] #170
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\CURAND.jl:51 [inlined]
[13] get!(default::CUDA.CURAND.var"#170#176"{CUDA.CURAND.var"#new_state#172", NamedTuple{(:device, :context, :stream, :math_mode, :math_precision), Tuple{CuDevice, CuContext, CuStream, CUDA.MathMode, Symbol}}}, h::Dict{CuContext, NamedTuple{(:rng,), Tuple{CUDA.CURAND.RNG}}}, key::CuContext)
@ Base .\dict.jl:465
[14] default_rng()
@ CUDA.CURAND C:\Users\DIi.julia\packages\CUDA\BbliS\lib\curand\CURAND.jl:50
[15] curand_rng
@ C:\Users\DIi.julia\packages\CUDA\BbliS\src\random.jl:262 [inlined]
[16] rand(::Type{Float32}, ::Int64)
@ CUDA C:\Users\DIi.julia\packages\CUDA\BbliS\src\random.jl:282
[17] rand(T::Type) (repeats 2 times)
@ CUDA C:\Users\DIi.julia\packages\CUDA\BbliS\src\random.jl:326
[18] top-level scope
@ REPL[4]:1
[19] top-level scope
@ C:\Users\DIi.julia\packages\CUDA\BbliS\src\initialization.jl:52

also CUDA.versioninfo() fails

julia> CUDA.versioninfo()
ERROR: CUDA error (code 103, UnknownMember)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA C:\Users\DIi.julia\packages\CUDA\BbliS\lib\cudadrv\error.jl:89
[2] macro expansion
@ C:\Users\DIi.julia\packages\CUDA\BbliS\lib\cudadrv\error.jl:97 [inlined]
[3] runtime_version()
@ CUDA C:\Users\DIi.julia\packages\CUDA\BbliS\lib\cudadrv\version.jl:44
[4] versioninfo(io::Base.TTY) (repeats 2 times)
@ CUDA C:\Users\DIi.julia\packages\CUDA\BbliS\src\utilities.jl:32
[5] top-level scope
@ REPL[5]:1
[6] top-level scope
@ C:\Users\DIi.julia\packages\CUDA\BbliS\src\initialization.jl:52

But otherwise CUDA seems to work.

Any ideas what the issue could be? I have already tried searching the relevant codes (203 and 103) and found little by way of answers.

The machine:

OS Name Microsoft Windows 10 Enterprise
Version 10.0.19044 Build 19044
Other OS Description Not Available
System Manufacturer Dell Inc.
System Model Precision 7740
System Type x64-based PC
Processor Intel(R) Xeon(R) E-2276M CPU @ 2.80GHz, 2808 Mhz, 6 Core(s), 12 Logical Processor(s)
Installed Physical Memory (RAM) 64.0 GB
Name NVIDIA Quadro RTX 4000
PNP Device ID PCI\VEN_10DE&DEV_1EB6&SUBSYS_09271028&REV_A1\4&F404CF9&0&0008
Adapter Type Quadro RTX 4000, NVIDIA compatible
Adapter Description NVIDIA Quadro RTX 4000
Adapter RAM (1,048,576) bytes
Driver Version 30.0.15.1006

Note: for whatever reason the the OS reports video RAM as being 1 GB (which matches the integrated Intel card). Nvitop, NVIDIA control panel, and CUDA.jl recognize there as being 8 GB as expected.

That’s during use of the CUDA runtime library, is why the error code isn’t understood. From the docs:

cudaErrorSoftwareValidityNotEstablished = 103

By default, the CUDA runtime may perform a minimal set of self-tests, as well as CUDA driver tests, to establish the validity of both. Introduced in CUDA 11.2, this error return indicates that at least one of these tests has failed and the validity of either the runtime or the driver could not be established.

So your system seems broken.

1 Like

Yeah I figured.

I was more asking if anyone has encountered something similar and how they might have solved it. I will start with reinstalling NVIDIA drivers and go from there.

Thanks.

Not on Windows, sorry. On Linux, I’d recommend checking dmesg where driver/library mismatches are reported. I’m not sure if Windows has an equivalent (i.e., if the NVIDIA driver logs anything in the Windows event log).

1 Like

Good idea. Logs are captured to Event Viewer unfortunately no log-able events were captured.

BTW you can also find events by going into device manager finding the display adapter, opening its properties window, and navigating to the “Events” tab which is much more straight forward in that they will already be filtered by device.

Reinstalling the video driver seems to have fixed the issue

julia> using CUDA

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
Unknown NVIDIA driver, for CUDA 12.0
CUDA driver 12.0

Libraries:

  • CUBLAS: 11.10.1
  • CURAND: 10.2.10
  • CUFFT: 10.7.1
  • CUSOLVER: 11.3.5
  • CUSPARSE: 11.7.3
  • CUPTI: 17.0.0
  • NVML: missing
    Downloaded artifact: CUDNN
  • CUDNN: 8.30.2 (for CUDA 11.5.0)
    Downloaded artifact: CUTENSOR
  • CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:

  • Julia: 1.6.3
  • LLVM: 11.0.1
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
  • Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
0: Quadro RTX 4000 (sm_75, 6.975 GiB / 8.000 GiB available)

Great!