Working with shared memory as one or more variables, what is a good approach?

bcsj · January 30, 2023, 9:51am

I’m somewhat new to writing gpu code, but I am working on a somewhat complicated kernel function, which involves a number of steps in which shared memory is used to speed up the computations.

To be efficient I reuse the preallocated shared memory in the different parts of the code. In some parts however, it would be beneficial for the shared memory to be tied to only one large array, e.g.

# case 1
shared = @cuDynamicSharedMem(T, 2 * size)

In other parts it would be convenient if the shared memory was partitioned to different variables, e.g.

# case 2
shA = @cuDynamicSharedMem(T, size)
shB = @cuDynamicSharedMem(T, size, sizeof(shA))

So I had the following idea: Allocate as in case 1, but then create some auxiliary variables which points into the memory. My thought was to do something like this:

struct Partition{T,N}
    S::T
    i::Int
    Partition{N}(S::T, i::Int) = new{T, N}(S, i)
end
getindex(P::Partition{T,N}, i::Int) = P.S[P.i * N + i]

and use it like:

# e.g size = 2^10
shared = @cuDynamicSharedMem(T, 2 * size)
shA = Partition{size}(shared, 0)
shB = Partition{size}(shared, 1)
# shA[i], shB[i] ( = shared[i], shared[size+i] )

My questions are now:

Is this idea reasonable?
Is there a better way to do something like this? - something built-in available?
Is there some performance aspect I’m neglecting where I might be stepping on my own toes by doing this?

maleadt · January 30, 2023, 1:07pm

view? https://github.com/JuliaGPU/CUDA.jl/blob/7681e085bde038730e11a7ff123937ccb325e910/test/device/intrinsics/memory.jl#L167-L200

bcsj · January 30, 2023, 1:14pm

Now I just feel embarrassed.

Topic		Replies	Views
Trying to understand the use of shared memory on GPUs GPU	3	2046	May 25, 2021
sharedMemory in GPU programming examples GPU	3	523	March 7, 2023
Kernel optimization and shared memory GPU	1	428	July 9, 2021
@cuDynamicSharedMem : allocating beforehand? GPU	2	1312	January 2, 2018
Two CuDeviceArrays inside one kernel General Usage	0	146	July 29, 2022

Working with shared memory as one or more variables, what is a good approach?

Related topics