Hello
I am trying:
using CUDA
S = rand(SVector{3,Float32},5)
DST = zeros(SVector{3,Float32},5)
I = [3,1,2,5,4]
# Works great on CPU
@CUDA.time NNlib.scatter!(+, DST,S,I)
0.000004 seconds
5-element Vector{SVector{3, Float32}}:
[1.3667885, 1.081403, 1.1134366]
[1.6543723, 0.50424564, 1.1448298]
[0.8695842, 1.8623418, 1.398939]
[1.0094697, 0.0052466393, 0.09184897]
[1.9389011, 0.3291297, 1.2308507]
# Bugs out on GPU
@CUDA.time NNlib.scatter!(+, CuArray(DST),CuArray(S),CuArray(I))
ERROR: InvalidIRError: compiling kernel #scatter_kernel!(typeof(+), CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_cas!)
Anyone knows why?
This is an example, I need it to work on GPU for a more complex case
Kind regards