I’m going through the tutorial https://juliagpu.gitlab.io/CuArrays.jl/tutorials/generated/intro/ and have some problems and errors. First the problems:
I’m starting as in the tutorial
using CuArrays, CUDAnative, BenchmarkTools
N=2^20
x=CuArrays.fill(1f0,N)
y=CuArrays.fill(2f0,N)
function gpu_add1!(y, x)
for i = 1:length(y)
@inbounds y[i] += x[i]
end
return nothing
end
Here I have a question about @inbounds which is supposed to stop checking whether something is in range or not. Does somebody have an example which explicitly shows the difference.
When looping over an N dimensinal Array A
for i=1:N+1
@inbounds A[i]=1
end
gives the same error as without, so I would like to understand where it makes the difference.
Next I want to call gpu_add1!(y,x)
. What happens if I do not prefix @cuda
? Does it run on CPU with graphiccard memory, because it seems to take longer than @cuda gpu_add1!(y,x)
.
I’m then running
@btime @cuda gpu_add1!(y,x)
and get the error
ERROR: CUDA error: the launch timed out and was terminated (code 702, ERROR_LAUNCH_TIMEOUT)
Stacktrace:
[1] cuLaunchKernel(::CUDAdrv.CuFunction, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::UInt32, ::Int64, ::CUDAdrv.CuStream, ::Array{Ptr{Nothing},1}, ::Ptr{Nothing}) at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\error.jl:123
[2] (::CUDAdrv.var"#350#351"{Bool,Int64,CUDAdrv.CuStream,CUDAdrv.CuFunction})(::Array{Ptr{Nothing},1}) at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:97
[3] macro expansion at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:63 [inlined]
[4] macro expansion at .\gcutils.jl:91 [inlined]
[5] macro expansion at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:61 [inlined]
[6] pack_arguments(::CUDAdrv.var"#350#351"{Bool,Int64,CUDAdrv.CuStream,CUDAdrv.CuFunction}, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}) at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:40
[7] #launch#349(::Int64, ::Int64, ::Bool, ::Int64, ::CUDAdrv.CuStream, ::typeof(CUDAdrv.launch), ::CUDAdrv.CuFunction, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,1,CUDAnative.AS.Global},N} where N) at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:90
[8] launch at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:85 [inlined]
[9] #355 at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:164 [inlined]
[10] macro expansion at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:125 [inlined]
[11] macro expansion at .\gcutils.jl:91 [inlined]
[12] macro expansion at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:124 [inlined]
[13] convert_arguments at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:108 [inlined]
[14] #cudacall#354 at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:163 [inlined]
[15] cudacall at C:\Users\Diger\.julia\packages\CUDAdrv\3EzC1\src\execution.jl:163 [inlined]
[16] #cudacall#199 at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:282 [inlined]
[17] cudacall at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:279 [inlined]
[18] macro expansion at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:263 [inlined]
[19] #call#187(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.call), ::CUDAnative.HostKernel{gpu_add1!,Tuple{CuDeviceArray{Float32,1,CUDAnative.AS.Global},CuDeviceArray{Float32,1,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}) at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:240
[20] call(::CUDAnative.HostKernel{gpu_add1!,Tuple{CuDeviceArray{Float32,1,CUDAnative.AS.Global},CuDeviceArray{Float32,1,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,1,CUDAnative.AS.Global},N} where N) at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:240
[21] #_#204(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::CUDAnative.HostKernel{gpu_add1!,Tuple{CuDeviceArray{Float32,1,CUDAnative.AS.Global},CuDeviceArray{Float32,1,CUDAnative.AS.Global}}}, ::CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,1,CUDAnative.AS.Global},N} where N) at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:454
[22] (::CUDAnative.HostKernel{gpu_add1!,Tuple{CuDeviceArray{Float32,1,CUDAnative.AS.Global},CuDeviceArray{Float32,1,CUDAnative.AS.Global}}})(::CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::Vararg{CuDeviceArray{Float32,1,CUDAnative.AS.Global},N} where N) at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:454
[23] macro expansion at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:178 [inlined]
[24] macro expansion at .\gcutils.jl:91 [inlined]
[25] macro expansion at C:\Users\Diger\.julia\packages\CUDAnative\RhbZ0\src\execution.jl:173 [inlined]
[26] ##core#408() at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:297
[27] ##sample#409(::BenchmarkTools.Parameters) at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:305
[28] sample at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:320 [inlined]
[29] #_lineartrial#41(::Int64, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(BenchmarkTools._lineartrial), ::BenchmarkTools.Benchmark{Symbol("##benchmark#407")}, ::BenchmarkTools.Parameters) at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:71
[30] _lineartrial(::BenchmarkTools.Benchmark{Symbol("##benchmark#407")}, ::BenchmarkTools.Parameters) at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:63
[31] #invokelatest#1 at .\essentials.jl:709 [inlined]
[32] invokelatest at .\essentials.jl:708 [inlined]
[33] #lineartrial#38 at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:33 [inlined]
[34] lineartrial at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:33 [inlined]
[35] #tune!#44(::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(tune!), ::BenchmarkTools.Benchmark{Symbol("##benchmark#407")}, ::BenchmarkTools.Parameters) at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:135
[36] tune! at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:134 [inlined] (repeats 2 times)
[37] top-level scope at C:\Users\Diger\.julia\packages\BenchmarkTools\7aqwe\src\execution.jl:391
A last error occurs when calling
index=threadIdx().x
in the REPL to see what it does. But it returns a looong error message which I can not post, because the window crashes at the end. Same with
stride = blockDim().x
I’m running Julia on windows10.