Hello! I’ve been having this strange issue. Running the following code in a Jupyter notebook causes it to be forever stuck on the last line:
using KernelAbstractions
using CUDA
@kernel function tile_lu_factor!(A, n)
I, J, K, L = @index(Global, NTuple)
for k = 1:2
if K == 1 && L == 1
for k = 1:n
@synchronize
end
end
if K == 1 && L <= 3-k
for k = 1:n
@synchronize
end
end
end
end
A = CuArray(rand(2, 2))
backend = get_backend(A)
tile_lu_factor!(backend, (2, 2, 1, 3))(A, 2, ndrange = (2, 2, 1, 3))
A_not_gpu = Array(A)
However when I replace the n with a 2 on the for loop instead, it is able to finish:
for k = 1:2
if K == 1 && L == 1
for k = 1:2
@synchronize
end
end
if K == 1 && L <= 3-k
for k = 1:2
@synchronize
end
end
end
What is going on?