I tried it on WIndows 10 with just-installed Cuda 11.1 and get test failures in cublas and exceptions:
CUDA toolkit 11.1.1, artifact installation
β CUDA driver 11.1.0
β NVIDIA driver 456.81.0
β
β Libraries:
β - CUBLAS: 11.3.0
β - CURAND: 10.2.2
β - CUFFT: 10.3.0
β - CUSOLVER: 11.0.1
β - CUSPARSE: 11.3.0
β - CUPTI: 14.0.0
β - NVML: 11.0.0+456.81
β - CUDNN: 8.0.4 (for CUDA 11.1.0)
β - CUTENSOR: 1.2.1 (for CUDA 11.1.0)
β
β Toolchain:
β - Julia: 1.5.2
β - LLVM: 9.0.1
β - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
β - Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
β
β 1 device:
β 0: GeForce 940MX (sm_50, 1.875 GiB / 2.000 GiB available)
Maybe this has to do with an absent feature in my GeForce940M card?
cublas: Error During Test at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\cublas.jl:1250
Got exception outside of a @test
CUBLASError: an absent device architectural feature is required (code 8, CUBLAS_STATUS_ARCH_MISMATCH)
Stacktrace:
[1] throw_api_error(::CUDA.CUBLAS.cublasStatus_t) at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\cublas\error.jl:47
[2] macro expansion at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\cublas\error.jl:58 [inlined]
[3] cublasGemmEx(::Ptr{Nothing}, ::Char, ::Char, ::Int64, ::Int64, ::Int64, ::Base.RefValue{Float16}, ::CuArray{Float16,2}, ::Type{T} where T, ::Int64, ::CuArray{Float16,2}, ::Type{T} where T, ::Int64, ::Base.RefValue{Float16}, ::CuArray{Float16,2}, ::Type{T} where T, ::Int64, ::CUDA.CUBLAS.cublasComputeType_t, ::CUDA.CUBLAS.cublasGemmAlgo_t) at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\utils\call.jl:93
[4] gemmEx!(::Char, ::Char, ::Number, ::Union{CuArray{T,2}, CuArray{T,1}} where T, ::Union{CuArray{T,2}, CuArray{T,1}} where T, ::Number, ::Union{CuArray{T,2}, CuArray{T,1}} where T; algo::CUDA.CUBLAS.cublasGemmAlgo_t) at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\cublas\wrappers.jl:836
[5] gemmEx! at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\cublas\wrappers.jl:818 [inlined]
[6] gemm_dispatch!(::CuArray{Float16,2}, ::CuArray{Float16,2}, ::CuArray{Float16,2}, ::Bool, ::Bool) at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\cublas\linalg.jl:216
[7] mul! at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\lib\cublas\linalg.jl:227 [inlined]
[8] mul!(::CuArray{Float16,2}, ::CuArray{Float16,2}, ::CuArray{Float16,2}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\LinearAlgebra\src\matmul.jl:208
[9] top-level scope at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\cublas.jl:1275
[10] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115
[11] top-level scope at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\cublas.jl:1251
[12] top-level scope at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115
[13] top-level scope at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\cublas.jl:438
[14] include(::String) at .\client.jl:457
[15] #9 at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\runtests.jl:78 [inlined]
[16] macro expansion at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\setup.jl:47 [inlined]
[17] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Test\src\Test.jl:1115 [inlined]
[18] macro expansion at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\setup.jl:47 [inlined]
[19] macro expansion at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\src\utilities.jl:35 [inlined]
[20] macro expansion at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\src\pool.jl:564 [inlined]
[21] top-level scope at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\setup.jl:46
[22] eval at .\boot.jl:331 [inlined]
[23] runtests(::Function, ::String, ::Symbol, ::Nothing) at C:\Users\pi96doc.julia\packages\CUDA\YeS8q\test\setup.jl:58
[24] (::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}})() at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294
[25] run_work_thunk(::Distributed.var"#106#108"{Distributed.CallMsg{:call_fetch}}, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:79
[26] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Distributed\src\process_messages.jl:294 [inlined]
[27] (::Distributed.var"#105#107"{Distributed.CallMsg{:call_fetch},Distributed.MsgHeader,Sockets.TCPSocket})() at .\task.jl:356