Modifying a thread-local vector within CUDA Dynamic Parallelism

vchuravy · February 12, 2024, 8:10pm

No that is sadly not possible. MArrays on the GPU currently depend on the ability of the compiler to inline all functions that use the MArray, to then turn the GC allocation into a stack allocated value as an optimization.

Since dynamic parallelism is explicitly a non-inlined function this can not occur.

Additionally I don’t even know if CUDA C supports this, since I think you can use dynamic parallelism to launch sub-kernels of different launch configurations and it is not clear to me whose address of the thread local memory would be passed to which thread in the sub-kernel

Topic		Replies	Views
Using MVector in CUDA without memory errors GPU	3	431	October 17, 2023
Local thread memory in GPU using StaticArrays GPU question , gpu , cuda	4	6251	January 26, 2020
Create static vector of variable lenght in gpu kernel GPU question , package	2	442	September 27, 2022
CUDA.jl - Sub-Vector Indexing Problem Inside CUDA Kernel GPU cuda , error , cuarrays , error-message , staticarrays	2	1242	March 28, 2022
CUDA.jl - Variable Sized Local Arrays Inside CUDA Kernel GPU gpuarrays , cuda , error , memory-allocation , physics	2	1689	March 28, 2022

Modifying a thread-local vector within CUDA Dynamic Parallelism

Related topics