Hello,
I am currently porting my code from CUDA.jl to ParallelStencil.jl. While I have successfully managed the transition for mixed Float64 and ComplexF64 types, I have run into a challenge replacing CUDA-specific atomic operations, specifically:
CUDA.@atomic F[i, j, 1] += ...
Based on the KernelAbstractions.jl documentation, I am considering using Atomix.jl, but I am unclear on how the integration works in practice. Specifically:
- If I initialize my setup with
@init_parallel_stencil(CUDA, Float64, 2, inbounds=true), will Atomix.jl automatically utilize the correct CUDA atomic instructions? - Are there specific steps or wrappers required to ensure Atomix.jl works seamlessly within a ParallelStencil kernel?
I have looked through the Atomix.jl documentation, but it is quite sparse. Any guidance or examples would be greatly appreciated!
Best regards