Hi,

I am curious if there is an way to do atomic operations on complex-valued arrays in `CUDA`

kernel, e.g.

```
using CUDA
function kernel!(x::CuDeviceArray{T}, y::CuDeviceArray{T}) where T
i = (blockIdx().x-1)*blockDim().x + threadIdx().x
if i <= length(x)
CUDA.atomic_add!(pointer(x, i), y[i])
end
return nothing
end
function test(T)
x = CuArray(rand(T, 2, 2))
y = CuArray(rand(T, 2, 2))
threads = 100
blocks = cld(length(x), threads)
@cuda threads=threads blocks=blocks kernel!(x,y)
end
```

will work if I run `test(Float32)`

but not if `test(ComplexF32)`

. I know this is because the atomics do not support the complex numbers. I can certainly divide a complex atomic operation with two atomics (one on real part and the other on imaginary part). But then, for example, I do not know how to pass the pointer to the real part of the i-th element of `x`

: `x[i].re`

.

In C++, I am able to do this by simply using `atomicAdd(&x[i].x, y[i].x)`

. However, I found that in Julia `CUDA`

, both the assignment to `x[i].re`

and taking pointer of it is troublesome (I met errors regarding `setindex!`

).

One may suggest using a shared memory to accumulate the summation first, but in my real code, the order of access to `y`

is not simple, so the use of shared memory is not straightforward.

I would like to ask if what I want to do is actually impossible, or there is a method to tackle this.

Thanks.