I haven’t thought too much about what you want to do, but the way to use @atomic is as shown here:
function count_indices(indices::CuArray{Int64,1}, maxSize::Int64)
# Initialize a CuArray of zeros with size maxSize
counts = CUDA.zeros(Int64, maxSize)
# Define the kernel function
function kernel(indices, counts)
idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
if idx <= length(indices)
CUDA.@atomic counts[indices[idx]] += 1
end
return
end
# Launch the kernel
threads = 256
blocks = cld(length(indices), threads)
@cuda threads=threads blocks=blocks kernel(indices, counts)
return counts
end
count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)
With result:
julia> count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)
4-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
6
1
4
0
Some tips;
- Use
CuVectorinstead ofCuArray - Use
Intinstead of specifying specificallyIntXX - If you are going to call this function a lot, preallocate
countoutside and make a in-place function,count_indices!. Then you can always define the function to do everything at once again
Kind regards