How do I make sure that GPU functions use the maximum potential config for performance?

Ahmed_Salih · January 15, 2023, 9:41am

Hello!

I am doing some programming on GPU using CUDA.

Instead of writing kernels I am doing the recommended approach of using commonly available functions for GPU programming in CUDA.jl such as “.”, “map” etc.

I noticed that when using custom kernels one would write something akin to:

@cuda threads=x blocks=y

So first question; how do I know what threads and blocks I should use?

Second question for the programming I do above using “default functions”, how do I ensure that it uses the maximum resources available?

Kind regards

Ahmed_Salih · January 15, 2023, 8:42pm

Anyone who could point me to a good ressource?

ToucheSir · January 15, 2023, 9:57pm

Have you seen the section with launch_configuration on Introduction · CUDA.jl?

dora_ho · January 16, 2023, 3:13pm

check out This notebook from the developer of CUDA.jl
cscs_gpu_course/2-2-kernel_analysis_optimization.ipynb at main · maleadt/cscs_gpu_course (github.com)
The occupancy API should be the stuff that you are looking for

Topic		Replies	Views
@cuda threads and blocks confusion GPU	9	3677	February 10, 2021
Understanding GPU Kernels GPU	4	2589	April 10, 2018
CUDA kernel configuration Performance gpu , cuda	3	682	March 28, 2022
The most general way to estimate the optimal arguments for @cuda macro Performance gpu , cudanative	6	1778	April 6, 2021
CUDA: blockdimensions and launch_configuration New to Julia question	0	178	April 17, 2024

How do I make sure that GPU functions use the maximum potential config for performance?

Related topics