index = CuArray{UInt32}([1])
function ker(index)
i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x
@cuprintln "i: " i
@cuprintln "index: " index[]
@cuprintln "old_index: " index[] += 1
return nothing
end
@cuda threads=1 blocks=1 ker(index)
In the REPL this prints
i: 1
index: 1
old_index: 2
whereas in nsight compute it prints
i: 1
index: 1
old_index: 2
i: 1
index: 2
old_index: 3
i: 1
index: 3
old_index: 4
i: 1
index: 4
old_index: 5
i: 1
index: 5
old_index: 6
i: 1
index: 6
old_index: 7
i: 1
index: 7
old_index: 8
i: 1
index: 8
old_index: 9
i: 1
index: 9
old_index: 10
i: 1
index: 10
old_index: 11
i: 1
index: 11
old_index: 12
i: 1
index: 12
old_index: 13
i: 1
index: 13
old_index: 14
i: 1
index: 14
old_index: 15
i: 1
index: 15
old_index: 16
i: 1
index: 16
old_index: 17
i: 1
index: 17
old_index: 18
i: 1
index: 18
old_index: 19
i: 1
index: 19
old_index: 20
i: 1
index: 20
old_index: 21
i: 1
index: 21
old_index: 22
i: 1
index: 22
old_index: 23
i: 1
index: 23
old_index: 24
i: 1
index: 24
old_index: 25
i: 1
index: 25
old_index: 26
i: 1
index: 26
old_index: 27
i: 1
index: 27
old_index: 28
i: 1
index: 28
old_index: 29
i: 1
index: 29
old_index: 30
i: 1
index: 30
old_index: 31
i: 1
index: 31
old_index: 32
i: 1
index: 32
old_index: 33
i: 1
index: 33
old_index: 34
i: 1
index: 34
old_index: 35
i: 1
index: 35
old_index: 36
i: 1
index: 36
old_index: 37
i: 1
index: 37
old_index: 38
i: 1
index: 38
old_index: 39
i: 1
index: 39
old_index: 40
i: 1
index: 40
old_index: 41
i: 1
index: 41
old_index: 42
i: 1
index: 42
old_index: 43
i: 1
index: 43
old_index: 44
i: 1
index: 44
old_index: 45
It seems nsight compute reruns the kernel many times which causes the index
variable to be wrong.
Is there a way to fix this as I’m using index
to index into data and it’s causing out of bounds errors in nsight compute. If it matters I’m stuck on Nsight compute version 2019.5.1 as I’m using a gtx 1070.