I too am having dificulties with running CUDA.jl under WSL 2.
This has solved the problem of not detecting the core libraries I guess. Since then I get a true
response from CUDA.functional()
.
However any basic operation like the one below results in:
julia> a = CuArray{Float32}(undef, 10)
10-element CuArray{Float32,1}:
julia> b = CuArray{Float32}(undef, 10)
10-element CuArray{Float32,1}:
julia> a + b
ERROR: CUDA error: PTX JIT compiler library not found (code 221, ERROR_JIT_COMPILER_NOT_FOUND)
Stacktrace:
[1] throw_api_error(::CUDA.cudaError_enum) at /home/coz/.julia/packages/CUDA/7vLVC/lib/cudadrv/error.jl:103
[2] CuModule(::String, ::Dict{CUDA.CUjit_option_enum,Any}) at /home/coz/.julia/packages/CUDA/7vLVC/lib/cudadrv/module.jl:42
[3] _cufunction(::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#20",Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,1,CUDA.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}}}},Int64}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/coz/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:337
[4] _cufunction at /home/coz/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:304 [inlined]
[5] check_cache(::typeof(CUDA._cufunction), ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#20",Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,1,CUDA.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}}}},Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/coz/.julia/packages/GPUCompiler/pCBTA/src/cache.jl:24
[6] broadcast_kernel at /home/coz/.julia/packages/GPUArrays/PkHCM/src/host/broadcast.jl:60 [inlined]
[7] cached_compilation(::typeof(CUDA._cufunction), ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#20",Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,1,CUDA.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}}}},Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/coz/.julia/packages/GPUCompiler/pCBTA/src/cache.jl:0
[8] cached_compilation at /home/coz/.julia/packages/GPUCompiler/pCBTA/src/cache.jl:44 [inlined]
[9] cufunction(::GPUArrays.var"#broadcast_kernel#20", ::Type{Tuple{CUDA.CuKernelContext,CuDeviceArray{Float32,1,CUDA.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(+),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CuDeviceArray{Float32,1,CUDA.AS.Global},Tuple{Bool},Tuple{Int64}}}},Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/coz/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:298
[10] cufunction at /home/coz/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:293 [inlined]
[11] #launch_heuristic#816 at /home/coz/.julia/packages/CUDA/7vLVC/src/gpuarrays.jl:19 [inlined]
[12] launch_heuristic at /home/coz/.julia/packages/CUDA/7vLVC/src/gpuarrays.jl:17 [inlined]
[13] copyto! at /home/coz/.julia/packages/GPUArrays/PkHCM/src/host/broadcast.jl:66 [inlined]
[14] copyto!(::CuArray{Float32,1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(+),Tuple{CuArray{Float32,1},CuArray{Float32,1}}}) at ./broadcast.jl:864
[15] copy(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},typeof(+),Tuple{CuArray{Float32,1},CuArray{Float32,1}}}) at ./broadcast.jl:840
[16] materialize at ./broadcast.jl:820 [inlined]
[17] broadcast_preserving_zero_d at ./broadcast.jl:809 [inlined]
[18] +(::CuArray{Float32,1}, ::CuArray{Float32,1}) at ./arraymath.jl:39
[19] top-level scope at REPL[7]:1
] test CUDA
also fails with a gazillion of outputs that overflow my terminal.
┌ Info: System information:
│ CUDA toolkit 11.0.194, artifact installation
│ CUDA driver 11.1.0
│
│ Libraries:
│ - CUBLAS: 11.1.0
│ - CURAND: 10.2.1
│ - CUFFT: 10.2.0
│ - CUSOLVER: 10.5.0
│ - CUSPARSE: 11.1.0
│ - CUPTI: 13.0.0
│ - NVML: missing
│ - CUDNN: 8.0.1 (for CUDA 11.0.0)
│ - CUTENSOR: 1.2.0 (for CUDA 11.0.0)
│
│ Toolchain:
│ - Julia: 1.4.2
│ - LLVM: 8.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3
│ - Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
│
│ 1 device(s):
└ - GeForce GTX 1050 Ti (sm_61, 3.297 GiB / 4.000 GiB available)
[ Info: Testing using 1 device(s): 1. GeForce GTX 1050 Ti (UUID a2d155cc-31af-3125-b9b2-bb5b43712f8d)
[ Info: Skipping the following tests: nvml, cutensor, device/wmma
| | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
initialization (2) | 3.27 | 0.00 | 0.0 | 0.00 | N/A | 0.06 | 1.9 | 219.96 | 834.47 |
apiutils (2) | 0.27 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 5.74 | 834.88 |
array (2) | failed at 2020-08-03T16:10:57.567
.
.
.
I have tried Ubuntu 18.04 and 20.04, with julia v1.4.2 and v1.5.0. The Nvidia BlackScholes sample from CUDA on WSL docs compiles and runs fine.
Any idea or glipmse of hope?