Why does this not work in Flux?

Hello!

New to the Flux library.

I went through their documentian and understand how it works for arrays of one dimension. I am trying to use it on GPU. The code I use is:

using Flux, CUDA, StaticArrays

T = Float32
NL = 10^6

src = CuArray(rand(SVector{3,T},NL))
idx = CuArray(rand(1:6195, NL))
dst = CuArray(zeros(SVector{3,T}, NL))
NNlib.scatter!(+, dst, src, idx)

Which returns the error:

ERROR: InvalidIRError: compiling kernel #scatter_kernel!(typeof(+), CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_cas!)

I’ve tested the CPU version and it works great - how come it breaks here?

Kind regards