Understanding Shuffle in CUDA

Hi everyone,
I am trying to use shuffle in a Cuda kernel, can someone explain to me how to use CUDA.shfl_sync, CUDA.shfl_up_sync, CUDA.shfl_down_sync, CUDA.shfl_xor_sync
with a simple example…? It’s not clear in the documentation.

1 Like