I am trying to get atomic to work. I am trying to produce an MWE so the code by itself doesn’t make any sense.
I have a buffer which I am trying to add 1.0 for every thread in block that gets run.
using CUDA
CUDA.allowscalar(false)
function atleast2_gpu_v1!(buffer)
i = threadIdx().x
j = 0.0
@atomic buffer[i] = +(buffer[i], j)
# @atomic buffer[i] = buffer[i] + j) # this also doesn't work
return
end
threads=256
buffer = CUDA.zeros(Float32, threads)
blocks = 1_000_000 ÷ threads
@device_code_warntype @cuda threads = threads blocks = blocks atleast2_gpu_v1!(buffer)
and I get this error complaining about returning a Union, but clearly I am return nothing, so it’s really mysterious what’s going on
PTX CompilerJob of kernel atleast2_gpu_v1!(CuDeviceArray{Float32,1,1}) for sm_75
Variables
#self#::Core.Compiler.Const(atleast2_gpu_v1!, false)
buffer::CuDeviceArray{Float32,1,1}
i::Int64
j::Float64
Body::Union{}
1 ─ %1 = Main.threadIdx()::NamedTuple{(:x, :y, :z),Tuple{Int64,Int64,Int64}}
│ (i = Base.getproperty(%1, :x))
│ (j = 0.0)
│ %4 = Core.tuple(i)::Tuple{Int64}
│ (CUDA.atomic_arrayset)(buffer, %4, Main.:+, j::Core.Compiler.Const(0.0, false))
└── Core.Compiler.Const(:(return), false)
ERROR: LoadError: GPU compilation of kernel atleast2_gpu_v1!(CuDeviceArray{Float32,1,1}) failed
KernelError: kernel returns a value of type `Union{}`
Make sure your kernel function ends in `return`, `return nothing` or `nothing`.If the returned value is of type `Union{}`, your Julia code probably throws an exception.
PTX CompilerJob of kernel atleast2_gpu_v1!(CuDeviceArray{Float32,1,1}) for sm_75
Variables
#self#::Core.Compiler.Const(atleast2_gpu_v1!, false)
buffer::CuDeviceArray{Float32,1,1}
i::Int64
j::Float64
Body::Union{}
1 ─ %1 = Main.threadIdx()::NamedTuple{(:x, :y, :z),Tuple{Int64,Int64,Int64}}
│ (i = Base.getproperty(%1, :x))
│ (j = 0.0)
│ %4 = Core.tuple(i)::Tuple{Int64}
│ (CUDA.atomic_arrayset)(buffer, %4, Main.:+, j::Core.Compiler.Const(0.0, false))
└── Core.Compiler.Const(:(return), false)
ERROR: LoadError: GPU compilation of kernel atleast2_gpu_v1!(CuDeviceArray{Float32,1,1}) failed
KernelError: kernel returns a value of type `Union{}`
Make sure your kernel function ends in `return`, `return nothing` or `nothing`.If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.
Stacktrace:
[1] check_method(::GPUCompiler.CompilerJob) at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\validation.jl:18
[2] macro expansion at C:\Users\RTX2080\.julia\packages\TimerOutputs\dVnaw\src\TimerOutput.jl:206 [inlined]
[3] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\driver.jl:63
[4] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\driver.jl:39
[5] compile at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\driver.jl:35 [inlined]
[6] _cufunction(::GPUCompiler.FunctionSpec{typeof(atleast2_gpu_v1!),Tuple{CuDeviceArray{Float32,1,1}}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\RTX2080\.julia\packages\CUDA\1DBvk\src\compiler\execution.jl:311
[7] _cufunction at C:\Users\RTX2080\.julia\packages\CUDA\1DBvk\src\compiler\execution.jl:305 [inlined]
[8] check_cache(::typeof(CUDA._cufunction), ::GPUCompiler.FunctionSpec{typeof(atleast2_gpu_v1!),Tuple{CuDeviceArray{Float32,1,1}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\cache.jl:24
[9] atleast2_gpu_v1! at c:\Users\RTX2080\AppData\Roaming\Code\User\globalStorage\buenon.scratchpads\scratchpads\2a695470f16de4fbb367ab34cdcda714\scratch80..jl:9 [inlined]
[10] cached_compilation(::typeof(CUDA._cufunction), ::GPUCompiler.FunctionSpec{typeof(atleast2_gpu_v1!),Tuple{CuDeviceArray{Float32,1,1}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\cache.jl:0
[11] cached_compilation at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\cache.jl:40 [inlined]
[12] cufunction(::typeof(atleast2_gpu_v1!), ::Type{Tuple{CuDeviceArray{Float32,1,1}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\RTX2080\.julia\packages\CUDA\1DBvk\src\compiler\execution.jl:299
[13] cufunction(::typeof(atleast2_gpu_v1!), ::Type{Tuple{CuDeviceArray{Float32,1,1}}}) at C:\Users\RTX2080\.julia\packages\CUDA\1DBvk\src\compiler\execution.jl:294
[14] top-level scope at C:\Users\RTX2080\.julia\packages\CUDA\1DBvk\src\compiler\execution.jl:109
[15] top-level scope at C:\Users\RTX2080\.julia\packages\GPUCompiler\5xT46\src\reflection.jl:144
[16] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
in expression starting at c:\Users\RTX2080\AppData\Roaming\Code\User\globalStorage\buenon.scratchpads\scratchpads\2a695470f16de4fbb367ab34cdcda714\scratch80..jl:40