CUDA.@atomic causes type instability?

I am not sure why adding CUDA.@atomic in the below code causes an error?

# works fine without CUDA.@atomic
function f(test)
    broadcast(CuVector(1:2), transpose(CuVector(1:3))) do i,j
        v::Float32 = j
        test[i] += v
    end
end

f(CUDA.zeros(Float32,2))

# ERROR with CUDA.@atomic
function g(test)
    broadcast(CuVector(1:2), transpose(CuVector(1:3))) do i,j
        v::Float32 = j
        CUDA.@atomic test[i] += v 
    end
end

g(CUDA.zeros(Float32,2))
ERROR: LoadError: GPU broadcast resulted in non-concrete element type Union{}.
This probably means that the function you are broadcasting contains an error or type instability.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] copy
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/broadcast.jl:44 [inlined]
 [3] materialize
   @ ./broadcast.jl:883 [inlined]
 [4] broadcast(::var"#45#46"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, ::LinearAlgebra.Transpose{Int64, CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}})
   @ Base.Broadcast ./broadcast.jl:821
 [5] g(test::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})

GPU broadcast resulted in non-concrete element type Union{}.

A return value of Union{} means that the function throws. Are you sure this is correct usage of CUDA.@atomic? For example:

julia> test = CuArray([1,2,3]);

julia> CUDA.@atomic test[1] += 1
ERROR: MethodError: no method matching atomic_add!(::CuPtr{Int64}, ::Int64)
Closest candidates are:
  atomic_add!(::Union{Core.LLVMPtr{Int64, 0}, Core.LLVMPtr{Int64, 1}, Core.LLVMPtr{Int64, 3}}, ::Int64) at ...

CUDA.@atomic only works in kernel context.

This usage looks OK, so try in a regular kernel where you can use @device_code_warntype to look at what’s wrong instead of our broadcast implementation here just bailing out.

1 Like

Thanks, I was being lazy using broadcast to sidestep the need to manually launch a kernel, call occupancy API, assign blocks and threads, etc :smiley:

It’s a pattern that should probably work though, so feel free to open an issue about it as well!

Done: https://github.com/JuliaGPU/CUDA.jl/issues/1253