Usage of CUDA.Const

I am unsure how to use the CUDA.Const function to mark an array as constant during the execution of a CUDA kernel. From the few examples I saw, it looks like I just need to define a variable within the kernel that “shadow” the original array, as in cxs = CUDA.Const(xs), and then I can directly use the new variable cxs. I tried that and it works, but surprisingly I do not see any benefit from its usage, although I would expect a significant one (the same kernel, rewritten with shared memory for xs, is almost twice as fast).

Am I missing something or is this expected?

With Const we emit loads that use a different GPU cache level (read-only texture cache instead of L1/L2), which used to be much faster. Modern hardware has much faster L1/L2 caches, resulting in ldg being much less relevant than it used to be. So depending on your hardware, this is expected.