Hi,
I’m trying to learn how to use shared memory on the GPU by following the very nice write-up in
https://jenni-westoby.github.io/Julia_GPU_examples/dev/Vector_dot_product/
however, I fail to understand basic things here. I’m referring to the Vector Dot Product example.
I have several questions but maybe some of them can be self-answered by answering the previous ones, so I won’t ask them all in adavnce
So the first question is: when you run the code as suggested
@cuda blocks = blocksPerGrid threads = threadsPerBlock shmem =
(threadsPerBlock * sizeof(Int64)) dot(a,b,c, N, threadsPerBlock, blocksPerGrid)
one must understand that all blocks and threads in each block will be exectude in parallel, right? So that the function dot(a,b,c,N,threadsPerBlock, blocksPerGrid)
is meant to be thought as what happens to one generic thread in one generic block. Am I right or wrong?
I ask this very basic thing becasue, not understanding well what happens, the code seems (to my dumb eyes) to be mixing things. For instance in the example you read
function dot(a,b,c, N, threadsPerBlock, blocksPerGrid)
# Set up shared memory cache for this current block.
cache = @cuDynamicSharedMem(Int64, threadsPerBlock)
and to my eyes this cache variable seems to make sense if I consider a whole block as being processed inside the function. Otherwise, if I consider this function as what happens to a certain thread in a certain block, it would be initializing an array for the block every time.
Now related to this, shall I understand this cache
variable stores an array in every block?
On the other hand, if I’m right and I should look at the whole function as to what happens for a single thread in a single block, then why should it declare cache
as an array for the whole block?
As you can see, I’m in a mess here I’m 100% sure I’m not understanding something quite basic, so I’ll appreciate it if somebody can shed some light here
Thanks for your patience,
Ferran.