CUDA race conditions

Hi together, i try to paralize some of my code with Cuda kernels, but somehow produced race conditions:

The kernel looks somewhat like:

function updateB!(inputs)
x = threadIdx().x + (blockIdx().x-1) * blockDim().x
if x >= 2 && x <= nx-2
dA[ x ] = (A[ x+1 ] - A[ x ])
B[ x ] += scalar * dA[ x ]
end
return nothing
end

it has probably something to do that i update B[ x ] at the end, and each thread try to write something into that array. However, i was not able to find a solution for this