Hi,
I am curious if there is an way to do atomic operations on complex-valued arrays in CUDA
kernel, e.g.
using CUDA
function kernel!(x::CuDeviceArray{T}, y::CuDeviceArray{T}) where T
i = (blockIdx().x-1)*blockDim().x + threadIdx().x
if i <= length(x)
CUDA.atomic_add!(pointer(x, i), y[i])
end
return nothing
end
function test(T)
x = CuArray(rand(T, 2, 2))
y = CuArray(rand(T, 2, 2))
threads = 100
blocks = cld(length(x), threads)
@cuda threads=threads blocks=blocks kernel!(x,y)
end
will work if I run test(Float32)
but not if test(ComplexF32)
. I know this is because the atomics do not support the complex numbers. I can certainly divide a complex atomic operation with two atomics (one on real part and the other on imaginary part). But then, for example, I do not know how to pass the pointer to the real part of the i-th element of x
: x[i].re
.
In C++, I am able to do this by simply using atomicAdd(&x[i].x, y[i].x)
. However, I found that in Julia CUDA
, both the assignment to x[i].re
and taking pointer of it is troublesome (I met errors regarding setindex!
).
One may suggest using a shared memory to accumulate the summation first, but in my real code, the order of access to y
is not simple, so the use of shared memory is not straightforward.
I would like to ask if what I want to do is actually impossible, or there is a method to tackle this.
Thanks.