@maleadt I’m sorry, I’ve too early to select the answer as the solution, but I’m stuck again…
I implemented the code for my own model (i.e. Δαᵢ = 0.1 * xᵢ⊤d # dummy is replaced with an update equation). When the code executed on CPU, the result was fine. On the other hand, when the code executed on GPU with blocks=Int(floor(numsample/32)) threads=32 instead of threads=numsample, the numerical result went to extremely large values…
Now I want to ask the followings.
- Can
d::CuVectorbe shared by all blocks and threads?- I expected that the reading
d(atxᵢ⊤d += Xᵢⱼ * d[j]) is an atomic read. Therefore while one thread readingd[j], all other blocks and threads cannot alter or corrupt the valued[j]which the other threads are reading. - I expected that the atomic updating
dis reflected on all blocks and threads immediately after the update.
- I expected that the reading
- Or,
d::CuVectoris just a host representation, thereforedis no longer shared after copied to GPU memory. If so, are there any way to realize my expectations?