Hi,

The following lines of code:

```
b, r = CUDA.threadIdx().x, CUDA.blockIdx().x
Ush = @cuStaticSharedMem(T, (D,2))
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
Ush[b,1] = U[b,id1,r]
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
Ush[b,2] = U[b,id2,r]
sync_threads()
ipl = ipl + 1
if ru2 == r
gt2 = Ush[bu2,1]
else
gt2 = U[bu2,id1,ru2]
end
# Do some computation with gt2
```

should always store in the variable `gt2`

the quantity `U[bu2,id1,ru2]`

, with the only difference that it takes its value from the shared memory `Ush`

when available (i.e. when `ru2==r`

).

Unfortunately this is not the case with the latest version of Julia 1.7.1. The type `T`

is a bit complex, but the following print statement:

```
CUDA.@cuprintln("[point: $b,$r; up: $bu2,$ru2, plane: $ipl]: A ",
real(gt2.u11), " ", imag(gt2.u11), " ",
real(gt2.u12), " ", imag(gt2.u12), " ",
real(gt2.u13), " ", imag(gt2.u13), " ",
real(gt2.u21), " ", imag(gt2.u21), " ",
real(gt2.u22), " ", imag(gt2.u22), " ",
real(gt2.u23), " ", imag(gt2.u23), " || ",
real(U[bu2,id1,ru2].u11), " ", imag(U[bu2,id1,ru2].u11), " ",
real(U[bu2,id1,ru2].u12), " ", imag(U[bu2,id1,ru2].u12), " ",
real(U[bu2,id1,ru2].u13), " ", imag(U[bu2,id1,ru2].u13), " ",
real(U[bu2,id1,ru2].u21), " ", imag(U[bu2,id1,ru2].u21), " ",
real(U[bu2,id1,ru2].u22), " ", imag(U[bu2,id1,ru2].u22), " ",
real(U[bu2,id1,ru2].u23), " ", imag(U[bu2,id1,ru2].u23))
```

produces, in the latest version Julia 1.7.1:

```
[point: 10,10; up: 14,10, plane: 5]: A 0.115366 0.076161 -0.439492 -0.397009 0.297175 -0.505585 0.547758 0.072546 0.579036 0.520827 0.034789 -0.350083 0.471146 0.218424 || -0.155018 0.208531 0.734900 0.210773 -0.422495 -0.411679 -0.299919 0.233662 -0.273552 -0.565263 0.118974 -0.668538
```

This only happens sometimes (i.e. for some values of `b,r,ipl`

) without any illuminating pattern. Comparing with a C implementation, Julia 1.7.1 produces wrong results, while older versions (1.6.X, 1.5.X) where producing results correct up to machine precision.

I can provide a link to a working code to reproduce the bug, but this will **not** be a simple piece of codeâ€¦

Any advice?

Many thanks!