How to understand MapReduce

Hello I am analyzing CUDA.jl mapReduce.jl file and I see two things that I do not understand

First I see that function shfl_down_sync is avoided when type of data is another than Bool, Int32, Int64, Float32, Float64, ComplexF32, ComplexF64
"With the cuda_fp16.h header included, T can also be __half or __half2. Similarly, with the cuda_bf16.h header included, T can also be __nv_bfloat16 or __nv_bfloat162. "

  • can this header be included in CUDA.jl?

@JuliaRegistrator regist

secondly in documentation - (Programming Guide :: CUDA Toolkit Documentation) __shfl_xor_sync() is suggested for reduction If I understand correctly why in this use case shfl_down_sync is better?

For reference

CUDA.jl directly calls PTX IRs. You could check CUDA.jl/warp_shuffle.jl at d87ee1cb4049ad45cb5d5b29fd5e872901ee2878 · JuliaGPU/CUDA.jl · GitHub.