I’m both a Julia and a CUDAnative newbie. CUDAnative is attracting me from my native numpy environment! This is probably a FAQ:
I have several arrays I want to live in Shared Memory. (They are constant throughout my computation, if that happens to matter.) I have found the @cuStaticSharedMem macro, and tried to initialize it on the host in (what to me is) the obvious way:
d_bc1d = @cuStaticSharedMem(Float32,size(bound_coords_1d))
d_bc1d .= bound_coords_1d
This causes the IJulia kernel to die, when executed from a Jupyter notebook.
What is the proper idiom for accomplishing this task?
Many thanks in advance for any help!!!
Shared memory is not to be initialized on the host, but only on the device. See these examples: https://devblogs.nvidia.com/using-shared-memory-cuda-cc/
This means the @cuStaticsharedMem
macro doesn’t work in host context, and can only live in device code.
Also, if those arrays are going to be constant, shared memory is probably not what you’re looking for. GPUs have caches where loaded values end up, so you probably don’t need to do anything. If the arrays are small, constant memory could help, or texture memory otherwise, but neither are currently supported by CUDAnative. Shared memory is used when eg. communicating intermediate computations to other threads outside of the warp.
For future reference, please provide full examples. That makes it much easier to provide help.
1 Like
OK, thanks @maleadt !
It’s a steep learning curve, and your help is much appreciated!
Yeah, CUDAnative is pretty low-level. For an actual example of shared memory, see eg. the reduce example: https://github.com/JuliaGPU/CUDAnative.jl/blob/master/examples/reduce/reduce.jl#L31
Implementations like this are supposed to end up in CuArrays/GPUarrays where they are much user friendlier to use. So depending on your use case you might want to have a look at those packages instead. But if you do need custom kernels, feel free to post any problems you have here.