Unexpected coalesced group behaviour in CUDA.jl

maleadt · January 24, 2025, 11:50am

That’s because thread 8 doesn’t participate in the shfl; for thread 8 higher_cg_lane is 9 so you don’t call CG.shfl.

You’re maybe better off keeping all threads participating and doing something like Cooperative Groups: Flexible CUDA Thread Programming | NVIDIA Technical Blog?

Topic		Replies	Views
Question about coalesced read and write to the global memory using CUDA.jl 2D grid GPU question	1	865	April 20, 2023
CUDA global synchronization HOWTO Performance gpu , gpuarrays , cuda	9	3050	January 20, 2022
GPU Synchronization Issue - using KernelAbstraction GPU question	5	491	December 13, 2023
Notes on `CUDA.sync_threads` and dispatch on `Union` GPU gpu	3	1089	April 16, 2021
How to understand MapReduce GPU	1	727	August 31, 2021