Adding at specific CuArray position

Andrea_Deflorio · May 5, 2024, 8:39am

How to fix this function?
Also how to fix the line #@atomic counts[indices[idx]] += 1, it doesn’t compile at all.

function count_indices(indices::CuArray{Int64,1}, maxSize::Int64)
    # Initialize a CuArray of zeros with size maxSize
    counts = CUDA.zeros(Int64, maxSize)

    # Define the kernel function
    function kernel(indices, counts)
        idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
        if idx <= length(indices)
            #@atomic counts[indices[idx]] += 1
            CUDA.atomic_add!(counts, indices[idx], 1)
        end
        return
    end

    # Launch the kernel
    threads = 256
    blocks = cld(length(indices), threads)
    @cuda threads=threads blocks=blocks kernel(indices, counts)

    return counts
end


count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)

This is the error I get in julia v1.9.1:
ERROR: InvalidIRError: compiling MethodInstance for (::var"kernel#7")(::CuDeviceVector{Int64, 1}, ::CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_add!)

Ahmed_Salih · May 5, 2024, 9:05am

I haven’t thought too much about what you want to do, but the way to use @atomic is as shown here:

function count_indices(indices::CuArray{Int64,1}, maxSize::Int64)
    # Initialize a CuArray of zeros with size maxSize
    counts = CUDA.zeros(Int64, maxSize)

    # Define the kernel function
    function kernel(indices, counts)
        idx = (blockIdx().x - 1) * blockDim().x + threadIdx().x
        if idx <= length(indices)
            CUDA.@atomic counts[indices[idx]] += 1
        end
        return
    end

    # Launch the kernel
    threads = 256
    blocks = cld(length(indices), threads)
    @cuda threads=threads blocks=blocks kernel(indices, counts)

    return counts
end


count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)

With result:

julia> count_indices(CuArray([1, 2, 3, 1, 3, 3, 3, 1, 1, 1, 1]), 4)
4-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
 6
 1
 4
 0

Some tips;

Use CuVector instead of CuArray
Use Int instead of specifying specifically IntXX
If you are going to call this function a lot, preallocate count outside and make a in-place function, count_indices!. Then you can always define the function to do everything at once again

Kind regards

Andrea_Deflorio · May 5, 2024, 11:23am

It works fine, Thank you!

maleadt · May 5, 2024, 5:54pm

FWIW, the underlying issue was probably that atomic_add! (and all other low-level atomic intrinsics) are really strict wrt. which types of arguments they accept, while CUDA.@atomic performs automatic conversions.

Ahmed_Salih · May 6, 2024, 12:46pm

Is the a way to get CUDA.@atomic add to work if the underlying values are SVector?

I have not had success with that.

SteffenPL · May 6, 2024, 1:34pm

I was also doing the same today. Seems like GPUs in general cannot atomically add to multiple values at once.

If you have a vector of SVectors, you can use reshape(reinterpret(Float32, X), 3, length(X)) to get a suitable vector for componentwise atomic actions.

Ahmed_Salih · May 6, 2024, 1:58pm

Thanks! That kind of removes the perk of using SVector though, but a possible work around yes

Topic		Replies	Views
Cannot manage to use CUDA.atomic_add! GPU cuda , atomic	4	51	June 30, 2025
CUDA.jl - Sub-Vector Indexing Problem Inside CUDA Kernel GPU cuda , error , cuarrays , error-message , staticarrays	2	1240	March 28, 2022
Create a simple CUDA.sum kernel GPU	3	1956	January 3, 2021
Atomic operations stop working when upgrading to CUDA.jl version 3.3 GPU question	4	1061	June 17, 2021
CUDA atomics on complex-valued arrays GPU cuda	3	1259	April 12, 2021

Adding at specific CuArray position

Related topics