Hello,
Iβm having some platform dependent problems getting CUDAnative working all the way. On my Ubuntu system I am having no problems, but on my windows machines the tests for the package are failing and I cannot make heads or tails out of it. I am seeing the same errors on 2 different systems with different GPUs, but both with Windows 10. I am including the mess that comes out of testing CUDAnative in the package manager.
Any ideas as to what the problem is and how I can fix it? I would really like to get this working on Windows.
Apologies for the newbie post.
(v1.1) pkg> test CUDAnative
Testing CUDAnative
Resolving package versions...
Status `C:\Users\matda\AppData\Local\Temp\jl_50F2.tmp\Manifest.toml`
[79e6a3ab] Adapt v0.4.2
[3895d2a7] CUDAapi v0.6.3
[c5f51814] CUDAdrv v3.0.0
[be33ccc6] CUDAnative v2.1.0
[a8cc5b0e] Crayons v4.0.0
[864edb3b] DataStructures v0.15.0
[929cbde3] LLVM v1.1.0
[bac558e1] OrderedCollections v1.1.0
[a759f4b9] TimerOutputs v0.5.0
[2a0f44e3] Base64 [`@stdlib/Base64`]
[ade2ca70] Dates [`@stdlib/Dates`]
[8ba89e20] Distributed [`@stdlib/Distributed`]
[b77e0a4c] InteractiveUtils [`@stdlib/InteractiveUtils`]
[76f85450] LibGit2 [`@stdlib/LibGit2`]
[8f399da3] Libdl [`@stdlib/Libdl`]
[37e2e46d] LinearAlgebra [`@stdlib/LinearAlgebra`]
[56ddb016] Logging [`@stdlib/Logging`]
[d6f4376e] Markdown [`@stdlib/Markdown`]
[44cfe95a] Pkg [`@stdlib/Pkg`]
[de0858da] Printf [`@stdlib/Printf`]
[3fa0cd96] REPL [`@stdlib/REPL`]
[9a3f8284] Random [`@stdlib/Random`]
[ea8e919c] SHA [`@stdlib/SHA`]
[9e88b42a] Serialization [`@stdlib/Serialization`]
[6462fe0b] Sockets [`@stdlib/Sockets`]
[8dfed614] Test [`@stdlib/Test`]
[cf7118a7] UUIDs [`@stdlib/UUIDs`]
[4ec0a83e] Unicode [`@stdlib/Unicode`]
[ Info: Testing using device GeForce GTX 1060
argument passing: Test Failed at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
Expression: out == "1 2 3"
Evaluated: "1 0 2" == "1 2 3"
Stacktrace:
[1] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
[2] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[3] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:863
[4] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[5] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:808
[6] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[7] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:5
argument passing: Test Failed at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
Expression: out == "1 2 3"
Evaluated: "1 0 2" == "1 2 3"
Stacktrace:
[1] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
[2] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[3] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:863
[4] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[5] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:808
[6] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[7] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:5
argument passing: Test Failed at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
Expression: out == "1 2 3"
Evaluated: "1 0 2" == "1 2 3"
Stacktrace:
[1] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
[2] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[3] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:863
[4] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[5] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:808
[6] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[7] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:5
argument passing: Test Failed at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
Expression: out == "1 2 3"
Evaluated: "1 0 2" == "1 2 3"
Stacktrace:
[1] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:876
[2] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[3] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:863
[4] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[5] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:808
[6] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[7] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:5
cooperative groups: Error During Test at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:952
Got exception outside of a @test
CUDA error: operation not supported (code #801, ERROR_NOT_SUPPORTED)
Stacktrace:
[1] macro expansion at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\base.jl:147 [inlined]
[2] (::getfield(CUDAdrv, Symbol("##25#26")){Bool,Int64,CuStream,CuFunction})(::Array{Ptr{Nothing},1}) at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:92
[3] macro expansion at .\gcutils.jl:87 [inlined]
[4] macro expansion at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:61 [inlined]
[5] pack_arguments(::getfield(CUDAdrv, Symbol("##25#26")){Bool,Int64,CuStream,CuFunction}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}) at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:40
[6] #launch#24(::Int64, ::Int64, ::Bool, ::Int64, ::CuStream, ::Function, ::CuFunction, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDAnative.AS.Global},N} where N) at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:90
[7] #launch at .\none:0 [inlined]
[8] #30 at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:179 [inlined]
[9] macro expansion at .\gcutils.jl:87 [inlined]
[10] macro expansion at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:139 [inlined]
[11] convert_arguments at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:123 [inlined]
[12] #cudacall#29 at C:\Users\matda\.julia\packages\CUDAdrv\3cR2F\src\execution.jl:178 [inlined]
[13] #cudacall at .\none:0 [inlined]
[14] #cudacall#155 at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\src\execution.jl:272 [inlined]
[15] #cudacall at .\none:0 [inlined]
[16] macro expansion at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\src\execution.jl:257 [inlined]
[17] #call#143(::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}, ::typeof(CUDAnative.call), ::CUDAnative.HostKernel{getfield(Main, Symbol("#kernel_vadd#324"))(),Tuple{CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}) at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\src\execution.jl:234
[18] (::getfield(CUDAnative, Symbol("#kw##call")))(::NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}, ::typeof(CUDAnative.call), ::CUDAnative.HostKernel{getfield(Main, Symbol("#kernel_vadd#324"))(),Tuple{CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDAnative.AS.Global},N} where N) at .\none:0
[19] #call#158(::Base.Iterators.Pairs{Symbol,Integer,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}}, ::CUDAnative.HostKernel{getfield(Main, Symbol("#kernel_vadd#324"))(),Tuple{CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDAnative.AS.Global},N} where N) at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\src\execution.jl:395
[20] (::getfield(CUDAnative, Symbol("#kw#HostKernel")))(::NamedTuple{(:cooperative, :threads, :blocks),Tuple{Bool,Int64,Int64}}, ::CUDAnative.HostKernel{getfield(Main, Symbol("#kernel_vadd#324"))(),Tuple{CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,2,CUDAnative.AS.Global},N} where N) at .\none:0
[21] top-level scope at gcutils.jl:87
[22] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\src\execution.jl:171
[23] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:968
[24] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[25] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:953
[26] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[27] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\device\execution.jl:5
[28] include at .\boot.jl:326 [inlined]
[29] include_relative(::Module, ::String) at .\loading.jl:1038
[30] include(::Module, ::String) at .\sysimg.jl:29
[31] include(::String) at .\client.jl:403
[32] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\runtests.jl:70
[33] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Test\src\Test.jl:1083
[34] top-level scope at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\runtests.jl:13
[35] include at .\boot.jl:326 [inlined]
[36] include_relative(::Module, ::String) at .\loading.jl:1038
[37] include(::Module, ::String) at .\sysimg.jl:29
[38] include(::String) at .\client.jl:403
[39] top-level scope at none:0
[40] eval(::Module, ::Any) at .\boot.jl:328
[41] exec_options(::Base.JLOptions) at .\client.jl:243
[42] _start() at .\client.jl:436
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time Allocations
ββββββββββββββββββββββ βββββββββββββββββββββββ
Tot / % measured: 333s / 5.94% 9.55GiB / 17.0%
Section ncalls time %tot avg alloc %tot avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
LLVM middle-end 309 8.98s 45.5% 29.1ms 637MiB 38.3% 2.06MiB
IR generation 309 6.49s 32.9% 21.0ms 566MiB 34.0% 1.83MiB
emission 309 3.96s 20.0% 12.8ms 399MiB 23.9% 1.29MiB
rewrite 308 2.32s 11.8% 7.54ms 162MiB 9.74% 540KiB
lower throw 308 666ms 3.37% 2.16ms 54.1MiB 3.25% 180KiB
hide unreachable 1.76k 375ms 1.90% 213ΞΌs 22.7MiB 1.36% 13.2KiB
predecessors 1.76k 179ms 0.91% 102ΞΌs 12.8MiB 0.77% 7.43KiB
find 1.76k 102ms 0.52% 58.1ΞΌs 438KiB 0.03% 255B
replace 1.76k 77.8ms 0.39% 44.2ΞΌs 5.33MiB 0.32% 3.10KiB
hide trap 308 56.2ms 0.28% 182ΞΌs 3.08MiB 0.19% 10.2KiB
clean-up 308 129ms 0.66% 420ΞΌs 5.05MiB 0.30% 16.8KiB
linking 308 64.6ms 0.33% 210ΞΌs 4.81KiB 0.00% 16.0B
optimization 303 2.24s 11.4% 7.40ms 68.9MiB 4.14% 233KiB
device library 1 103ms 0.52% 103ms 1.44KiB 0.00% 1.44KiB
runtime library 63 70.4ms 0.36% 1.12ms 74.0KiB 0.00% 1.18KiB
validation 548 6.96s 35.3% 12.7ms 586MiB 35.2% 1.07MiB
CUDA object generation 216 3.03s 15.3% 14.0ms 437MiB 26.2% 2.02MiB
linking 216 1.86s 9.42% 8.62ms 216MiB 13.0% 1.00MiB
compilation 216 1.16s 5.90% 5.39ms 220MiB 13.2% 1.02MiB
LLVM back-end 254 773ms 3.92% 3.04ms 6.00MiB 0.36% 24.2KiB
machine-code generation 254 663ms 3.36% 2.61ms 953KiB 0.06% 3.75KiB
preparation 254 110ms 0.55% 431ΞΌs 5.06MiB 0.30% 20.4KiB
Julia front-end 310 4.25ms 0.02% 13.7ΞΌs 81.3KiB 0.00% 269B
strip debug info 68 377ΞΌs 0.00% 5.54ΞΌs 0.00B 0.00% 0.00B
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Test Summary: | Pass Fail Error Broken Total
CUDAnative | 354 4 1 1 360
base interface | No tests
pointer | 20 20
code generation | 90 1 91
code generation (relying on a device) | 6 6
execution | 68 4 1 73
@cuda | 10 10
argument passing | 28 28
exceptions | 17 17
shmem divergence bug | 7 7
dynamic parallelism | 6 4 10
basic usage | 1 1
anonymous functions | 1 1
closures | 1 1
argument passing | 1 4 5
self-recursion | 1 1
deep recursion | 1 1
cooperative groups | 1 1
pointer | 41 41
device arrays | 20 20
CUDA functionality | 98 98
examples | 6 6
ERROR: LoadError: Some tests did not pass: 354 passed, 4 failed, 1 errored, 1 broken.
in expression starting at C:\Users\matda\.julia\packages\CUDAnative\wU0tS\test\runtests.jl:9
ERROR: Package CUDAnative errored during testing