How to understand MapReduce

Hello I am analyzing CUDA.jl mapReduce.jl file and I see two things that I do not understand

First I see that function shfl_down_sync is avoided when type of data is another than Bool, Int32, Int64, Float32, Float64, ComplexF32, ComplexF64
"With the cuda_fp16.h header included, T can also be __half or __half2. Similarly, with the cuda_bf16.h header included, T can also be __nv_bfloat16 or __nv_bfloat162. "

  • can this header be included in CUDA.jl?

@JuliaRegistrator regist

secondly in documentation - (Programming Guide :: CUDA Toolkit Documentation) __shfl_xor_sync() is suggested for reduction If I understand correctly why in this use case shfl_down_sync is better?

For reference
https://github.com/JuliaGPU/CUDA.jl/blob/afe81794038dddbda49639c8c26469496543d831/src/mapreduce.jl

CUDA.jl directly calls PTX IRs. You could check https://github.com/JuliaGPU/CUDA.jl/blob/d87ee1cb4049ad45cb5d5b29fd5e872901ee2878/src/device/intrinsics/warp_shuffle.jl#L40-L73.