Is sharedmemory really accelerates GPU kernel?

maleadt · December 2, 2024, 1:39pm

Shared memory is not going to always improve performance. For one, it may lower occupancy as it’s a shared resource limiting how many threads can be launched. But also, you seem to be using it here to simply cache accesses to read-only arrays. Modern GPUs are much better at automatically caching such reads, which may explain why shared memory doesn’t help here. It is still very relevant as a communication mechanism between threads, e.g., to implement a reduction.

If you want to be sure, run these two kernels under NSight Compute, which can show you accurately how memory is accessed and cached:

Topic		Replies	Views
Kernel optimization and shared memory GPU	1	433	July 9, 2021
I don't understand why it is slower with CuStaticSharedArray New to Julia gpu , cuda , sharedarrays , cudajl	9	270	March 17, 2025
Correct usage of shared memory？ GPU	5	842	January 20, 2024
sharedMemory in GPU programming examples GPU	3	607	March 7, 2023
CUDA unexplainable SPEEDUP! Local memory? Performance cuda	4	452	December 29, 2021

Is sharedmemory really accelerates GPU kernel?

Related topics