Hi,
The following lines of code:
b, r = CUDA.threadIdx().x, CUDA.blockIdx().x
Ush = @cuStaticSharedMem(T, (D,2))
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
Ush[b,1] = U[b,id1,r]
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
Ush[b,2] = U[b,id2,r]
sync_threads()
ipl = ipl + 1
if ru2 == r
gt2 = Ush[bu2,1]
else
gt2 = U[bu2,id1,ru2]
end
# Do some computation with gt2
should always store in the variable gt2
the quantity U[bu2,id1,ru2]
, with the only difference that it takes its value from the shared memory Ush
when available (i.e. when ru2==r
).
Unfortunately this is not the case with the latest version of Julia 1.7.1. The type T
is a bit complex, but the following print statement:
CUDA.@cuprintln("[point: $b,$r; up: $bu2,$ru2, plane: $ipl]: A ",
real(gt2.u11), " ", imag(gt2.u11), " ",
real(gt2.u12), " ", imag(gt2.u12), " ",
real(gt2.u13), " ", imag(gt2.u13), " ",
real(gt2.u21), " ", imag(gt2.u21), " ",
real(gt2.u22), " ", imag(gt2.u22), " ",
real(gt2.u23), " ", imag(gt2.u23), " || ",
real(U[bu2,id1,ru2].u11), " ", imag(U[bu2,id1,ru2].u11), " ",
real(U[bu2,id1,ru2].u12), " ", imag(U[bu2,id1,ru2].u12), " ",
real(U[bu2,id1,ru2].u13), " ", imag(U[bu2,id1,ru2].u13), " ",
real(U[bu2,id1,ru2].u21), " ", imag(U[bu2,id1,ru2].u21), " ",
real(U[bu2,id1,ru2].u22), " ", imag(U[bu2,id1,ru2].u22), " ",
real(U[bu2,id1,ru2].u23), " ", imag(U[bu2,id1,ru2].u23))
produces, in the latest version Julia 1.7.1:
[point: 10,10; up: 14,10, plane: 5]: A 0.115366 0.076161 -0.439492 -0.397009 0.297175 -0.505585 0.547758 0.072546 0.579036 0.520827 0.034789 -0.350083 0.471146 0.218424 || -0.155018 0.208531 0.734900 0.210773 -0.422495 -0.411679 -0.299919 0.233662 -0.273552 -0.565263 0.118974 -0.668538
This only happens sometimes (i.e. for some values of b,r,ipl
) without any illuminating pattern. Comparing with a C implementation, Julia 1.7.1 produces wrong results, while older versions (1.6.X, 1.5.X) where producing results correct up to machine precision.
I can provide a link to a working code to reproduce the bug, but this will not be a simple piece of code…
Any advice?
Many thanks!