Adding at specific CuArray position

How to fix this function?
Also how to fix the line #@atomic counts[indices[idx]] += 1, it doesn’t compile at all.

function count_indices(indices::CuArray{Int64,1}, maxSize::Int64)
    # Initialize a CuArray of zeros with size maxSize
    counts = CUDA.zeros(Int64, maxSize)

    # Define the kernel function
    function kernel(indices, counts)
        idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
        if idx <= length(indices)
            #@atomic counts[indices[idx]] += 1
            CUDA.atomic_add!(counts, indices[idx], 1)
        end
        return
    end

    # Launch the kernel
    threads = 256
    blocks = cld(length(indices), threads)
    @cuda threads=threads blocks=blocks kernel(indices, counts)

    return counts
end


count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)

This is the error I get in julia v1.9.1:
ERROR: InvalidIRError: compiling MethodInstance for (::var"kernel#7")(::CuDeviceVector{Int64, 1}, ::CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_add!)

I haven’t thought too much about what you want to do, but the way to use @atomic is as shown here:

function count_indices(indices::CuArray{Int64,1}, maxSize::Int64)
    # Initialize a CuArray of zeros with size maxSize
    counts = CUDA.zeros(Int64, maxSize)

    # Define the kernel function
    function kernel(indices, counts)
        idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
        if idx <= length(indices)
            CUDA.@atomic counts[indices[idx]] += 1
        end
        return
    end

    # Launch the kernel
    threads = 256
    blocks = cld(length(indices), threads)
    @cuda threads=threads blocks=blocks kernel(indices, counts)

    return counts
end


count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)

With result:

julia> count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)
4-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
 6
 1
 4
 0

Some tips;

  1. Use CuVector instead of CuArray
  2. Use Int instead of specifying specifically IntXX
  3. If you are going to call this function a lot, preallocate count outside and make a in-place function, count_indices!. Then you can always define the function to do everything at once again

Kind regards

It works fine, Thank you!

1 Like

FWIW, the underlying issue was probably that atomic_add! (and all other low-level atomic intrinsics) are really strict wrt. which types of arguments they accept, while CUDA.@atomic performs automatic conversions.

Is the a way to get CUDA.@atomic add to work if the underlying values are SVector?

I have not had success with that.

I was also doing the same today. Seems like GPUs in general cannot atomically add to multiple values at once.

If you have a vector of SVectors, you can use reshape(reinterpret(Float32, X), 3, length(X)) to get a suitable vector for componentwise atomic actions.

Thanks! That kind of removes the perk of using SVector though, but a possible work around yes

1 Like