Dynamic parallelism slow in CUDA.jl

maleadt · July 25, 2024, 6:09pm

By default, space is reserved for 2048 pending child grids; this can be extended by setting the appropriate device limit, as in the following code.
…
The runtime first tries to add the newly launched grid to the fixed-size pool, and if it is full, uses the virtualized pool. While this means that grids are queued successfully, the costs of using the virtualized pool are higher than those of the fixed-size pool.

Topic		Replies	Views
CUDA.jl - A Clear Example of Dynamic Parallelism GPU cuda , kernel	6	2381	November 18, 2022
Clarifying expected behavior of dynamic CUDA kernels GPU question , parallel , cuda , dynamic-parallelism	4	116	January 12, 2025
Kernel with dynamic parallelism seems to be calling CPU functions GPU	4	122	July 19, 2025
Error when using dynamic parallelism with six or more arguments GPU	1	429	August 14, 2020
Status of Dynamic Parallelism Support in CUDANative.jl GPU gpu	4	1019	May 31, 2017

Dynamic parallelism slow in CUDA.jl

Related topics