Hi again guys

this time I’m facing one problem that makes me think that I’m missing something very important in CUDA programming, so maybe somebody can help me out to understand what goes on. Consider this

```
m = CuArrays.zeros(2);
cic = CuArray(CartesianIndices(rand(3,4)));
res = CuArrays.zeros(length(cic));
Ncuts = 2
function test2_CUDA(res,CI,m,Ncuts)
index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = blockDim().x * gridDim().x
for i = index:stride:length(CI)
for k in 1:Ncuts
m[k] = CI[i][k]-1
end;
for k in 1:Ncuts
res[i] += m[k]
end
end
end;
numblocks = 256
@cuda threads = 256 blocks = numblocks test2_CUDA(res,cic,m,Ncuts)
res
12-element CuArray{Float32,1,Nothing}:
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
```

now the question is: how can it be all the elements in the array res give the same value? This is supposed to give the sum of the CartesianIndexes (minus 1 in each element), so definitely not a constant…

For the sake of comparison, the same calculation non-CUDA

```
# Test functions
#
function test2(res,CI,m,Ncuts)
for i in 1:length(CI)
for k in 1:Ncuts
m[k] = CI[i][k]-1
end;
for k in 1:Ncuts
res[i] += m[k]
end
end
end;
m = zeros(2);
cic = CartesianIndices(rand(3,4));
res = zeros(length(cic));
Ncuts = 2
test2(res,cic,m,Ncuts)
res
12-element Array{Float64,1}:
0.0
1.0
2.0
1.0
2.0
3.0
2.0
3.0
4.0
3.0
4.0
5.0
```

Can somebody help me understand this?

Thanks a lot,

Ferran.