index = CuArray{UInt32}([1])
function ker(index)
i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x
@cuprintln "i: " i
@cuprintln "index: " index[]
@cuprintln "old_index: " index[] += 1
return nothing
end
@cuda threads=1 blocks=1 ker(index)
In the REPL this prints
i: 1
index: 1
old_index: 2
whereas in nsight compute it prints
i: 1
index: 1
old_index: 2
i: 1
index: 2
old_index: 3
i: 1
index: 3
old_index: 4
i: 1
index: 4
old_index: 5
i: 1
index: 5
old_index: 6
i: 1
index: 6
old_index: 7
i: 1
index: 7
old_index: 8
i: 1
index: 8
old_index: 9
i: 1
index: 9
old_index: 10
i: 1
index: 10
old_index: 11
i: 1
index: 11
old_index: 12
i: 1
index: 12
old_index: 13
i: 1
index: 13
old_index: 14
i: 1
index: 14
old_index: 15
i: 1
index: 15
old_index: 16
i: 1
index: 16
old_index: 17
i: 1
index: 17
old_index: 18
i: 1
index: 18
old_index: 19
i: 1
index: 19
old_index: 20
i: 1
index: 20
old_index: 21
i: 1
index: 21
old_index: 22
i: 1
index: 22
old_index: 23
i: 1
index: 23
old_index: 24
i: 1
index: 24
old_index: 25
i: 1
index: 25
old_index: 26
i: 1
index: 26
old_index: 27
i: 1
index: 27
old_index: 28
i: 1
index: 28
old_index: 29
i: 1
index: 29
old_index: 30
i: 1
index: 30
old_index: 31
i: 1
index: 31
old_index: 32
i: 1
index: 32
old_index: 33
i: 1
index: 33
old_index: 34
i: 1
index: 34
old_index: 35
i: 1
index: 35
old_index: 36
i: 1
index: 36
old_index: 37
i: 1
index: 37
old_index: 38
i: 1
index: 38
old_index: 39
i: 1
index: 39
old_index: 40
i: 1
index: 40
old_index: 41
i: 1
index: 41
old_index: 42
i: 1
index: 42
old_index: 43
i: 1
index: 43
old_index: 44
i: 1
index: 44
old_index: 45
It seems nsight compute reruns the kernel many times which causes the index variable to be wrong.
Is there a way to fix this as I’m using index to index into data and it’s causing out of bounds errors in nsight compute. If it matters I’m stuck on Nsight compute version 2019.5.1 as I’m using a gtx 1070.