KernelAbstractions nested task error: UndefVarError: variable not defined

Hey,

I am facing interesting issues with KernelAbstractions. Why is this code:

using KernelAbstractions
using CUDA
using CUDAKernels

@kernel synctesttt(epoch) = begin
	I = @index(Global)
	eex=1
	while eex <= epoch
		@print("what\n")
		@synchronize()
		@print("huhh\n")
		eex +=1
	end
end

     
@time kernFIXG = synctesttt(CPU())       ## MUTABLE struct unsupported                                       
@time wait(kernFIXG(3, ndrange=10)) 
@time wait(kernFIXG(3, ndrange=10)) 
@time wait(kernFIXG(3, ndrange=10))

Gives me the issue this?

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:334 [inlined]
 [2] wait
   @ ~/.julia/packages/KernelAbstractions/plICi/src/cpu.jl:65 [inlined]
 [3] wait (repeats 2 times)
   @ ~/.julia/packages/KernelAbstractions/plICi/src/cpu.jl:29 [inlined]
 [4] top-level scope
   @ ./timing.jl:220 [inlined]
 [5] top-level scope
   @ ~/repo/test/tests/speedtests/test_gpu_cpu_revise.jl:0

    nested task error: UndefVarError: eex not defined
    Stacktrace:
     [1] overdub
       @ ~/.julia/packages/KernelAbstractions/plICi/src/macros.jl:266 [inlined]
     [2] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_synctesttt)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, args::Tuple{Int64}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
       @ KernelAbstractions ~/.julia/packages/KernelAbstractions/plICi/src/cpu.jl:157
     [3] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_synctesttt)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, args::Tuple{Int64}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
       @ KernelAbstractions ~/.julia/packages/KernelAbstractions/plICi/src/cpu.jl:130
     [4] (::KernelAbstractions.var"#37#38"{Nothing, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_synctesttt)}, Tuple{Int64}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, Tuple{Int64}, KernelAbstractions.NDIteration.DynamicCheck}})()
       @ KernelAbstractions ~/.julia/packages/KernelAbstractions/plICi/src/cpu.jl:22
in expression starting at /home/master/repo/test/tests/speedtests/test_gpu_cpu_revise.jl:85

I feel like @synchronize() kills everything.

This is a minimal example of a problem, but with @syncronize somehow many many things goes off. Somehow It doesn’t even enter into the while loop in the other complex scenario.