How do I make sure that GPU functions use the maximum potential config for performance?


I am doing some programming on GPU using CUDA.

Instead of writing kernels I am doing the recommended approach of using commonly available functions for GPU programming in CUDA.jl such as “.”, “map” etc.

I noticed that when using custom kernels one would write something akin to:

@cuda threads=x blocks=y

So first question; how do I know what threads and blocks I should use?

Second question for the programming I do above using “default functions”, how do I ensure that it uses the maximum resources available?

Kind regards

1 Like

Anyone who could point me to a good ressource? :blush:

Have you seen the section with launch_configuration on Introduction · CUDA.jl?

1 Like

check out This notebook from the developer of CUDA.jl
cscs_gpu_course/2-2-kernel_analysis_optimization.ipynb at main · maleadt/cscs_gpu_course (
The occupancy API should be the stuff that you are looking for