Hello,
I want to place particles into a 10x10 grid. To do this, I would like to use CUDA.atomic_add!()
to count how many particles land in each cell.
Here is a minimal example:
using CUDA
function kernel_test_atomicadd!(arr, N, positions)
i = (blockIdx().x - 1) * blockDim().x + threadIdx().x
# Convert positions to (x, y) grid indices
x = floor(Int, positions[i,1])+1
y = floor(Int, positions[i,2])+1
linear_idx = (y-1) * 10 + x
# Atomically increment N[x, y], storing the previous value
n = CUDA.atomic_add!(CUDA.pointer(N, linear_idx), 1)
# Add the index i of the particle in arr[x,y,n+1]
arr[x,y,n+1] = i
return nothing
end
# Create a 10x10 grid to hold particle indices (max 50 per cell)
arr = CUDA.zeros(10,10,50);
# Initialize counter for particles per cell
N = CUDA.zeros(10,10);
# Random particle positions in the 10x10 space
positions = CUDA.rand(256,2)* 10.0;
@cuda threads = 256 kernel_test_atomicadd!(arr, N, positions)
I get this error message:
Reason: unsupported dynamic function invocation (call to atomic_add!)
If I replace pointer()
by CUDA.Ref(N[x,y])
then I get the same error with.
I think I cannot use @atomic
because I need to read the value of N[x,y] before adding 1, is it true ?
# This is working but does not correspond to what I am looking for
n = N[x,y]
@atomic N[x,y]+=1
Thank you for your help !
julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 552.86.0
CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+552.86
Julia packages:
- CUDA: 5.6.1
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0
Toolchain:
- Julia: 1.11.5
- LLVM: 16.0.6
1 device:
0: NVIDIA RTX 2000 Ada Generation (sm_89, 11.572 GiB / 15.996 GiB available)