Constant Memory?


#1

I believe there is support for Shared Memory. However, I cannot find any references to Constant Memory being supported by CUDAnative.

As I read through books on CUDA programming, it seems like a useful thing that will help improve performance.


#2

With CUDAnative:

With GPUArrays you can also use this:

To create shared memory in a hardware independent way (will also work with CLArrays + CuArrays).


#3

But this seems like shared memory not constant memory, wasn’t the question on constant memory, i.e. memory for constants!?


#4

I agree, that doesn’t seem to answer the question. Constant memory on the GPU is entirely different from regular shared memory. And there is also texture memory.

All these different kinds of memory, manually managed by the programmer, are what makes GPU programming so much fun! On the CPU, I pine for the ability (indeed, the requirement!) to manually decide what is in the L1, L2, and L3 cache.


#5

Correct, constant memory is currently not supported (but it’s kinda hard to get reliable speedups from it anyways, so no big loss IMO). A student at my lab is working on this, so expect improvements some time next year.


#6

Oh how true! Sorry, I read it all wrong…

Yeah I guess this is a feature not too high on the priority list, but should be pretty easy to add if it’s really needed! :wink:


#7

But I guess until these features are implemented, Julia’s GPU ecosystem will be incomplete. I have some friends working on CUDA C and I find it hard to recommend CUDANative as a complete alternative in Julia because it doesn’t have some of these features. A vote up for pushing it up the priority list if votes count :slight_smile:


#8

As per that reasoning, the ecosystem will be always incomplete. We don’t have the manpower to implement all features, let alone do it as soon as NVIDIA releases them, so it will always be a matter of priorities. Right now, constant memory is not high on that list, especially because I haven’t seen that much code using it. Feel free to suggest features we should work on though.


#9

Hi Tim,

how is CUDAnative,jl going with the texture memory?

Now I am implementing 2d numerical integration with Trapezoid rule. There is an operation of plaquette averaging which is identical to that used in the image blurring. It would be very nice to have access to texture memory from CUDAnative !


#10

Not implemented yet. But we have __ldg now, which loads through the texture cache, so should yield similar results.


#11

Yes, I wrote a general purpose reduction kernel, and I need to pass some “control parameters” (for instance the shape of input and output tensors) to all threads for boundary checking. These parameters could be stored in the constant memory simply because they are literally constant. I am not very skilled in CUDA and I was not able to tract how these “control parameters” are actually handled.


#12

Reading the post in that development. Thank you for the info!