How to understand MapReduce

Jakub_Mitura · August 31, 2021, 10:50am

Hello I am analyzing CUDA.jl mapReduce.jl file and I see two things that I do not understand

First I see that function shfl_down_sync is avoided when type of data is another than Bool, Int32, Int64, Float32, Float64, ComplexF32, ComplexF64
"With the cuda_fp16.h header included, T can also be __half or __half2. Similarly, with the cuda_bf16.h header included, T can also be __nv_bfloat16 or __nv_bfloat162. "

can this header be included in CUDA.jl?

@JuliaRegistrator regist

secondly in documentation - (Programming Guide :: CUDA Toolkit Documentation) __shfl_xor_sync() is suggested for reduction If I understand correctly why in this use case shfl_down_sync is better?

For reference
https://github.com/JuliaGPU/CUDA.jl/blob/afe81794038dddbda49639c8c26469496543d831/src/mapreduce.jl

Jakub_Mitura · August 31, 2021, 11:23am

CUDA.jl directly calls PTX IRs. You could check https://github.com/JuliaGPU/CUDA.jl/blob/d87ee1cb4049ad45cb5d5b29fd5e872901ee2878/src/device/intrinsics/warp_shuffle.jl#L40-L73.

Topic		Replies	Views
Using mapreduce on GPU with CUDA.jl GPU question , cuda	1	708	December 25, 2023
How do I to transform mapreduce function to work well with CUDA? GPU	5	1548	May 14, 2021
CUDA.jl 2.0: Per-thread streams, Float16, CUSPARSE clean-up Package Announcements	2	802	October 2, 2020
CUDA.jl - When to synchronize General Usage cuda	11	597	March 6, 2025
CUDA.jl with missing data? GPU	4	690	November 5, 2020

How to understand MapReduce

Related topics