ERROR: LoadError: UndefVarError: `local_d` not defined in `Main`

Welcome to the Julia community!

Please read PSA: how to quote code with backticks to improve your formatting.


I’ve tried to run your code with CUDABackend(). Apart from changing a ROCArray into a CuArray I also had to

  • change temp = @localmem(Int32, group_sz) into @temp = @localmem(Int32, GROUP_SIZE)
  • remove the wait(total_found)
  • add KADevice to output_indices = KernelAbstractions.zeros(Int32, count).

Then I get Int32[3, 4, 6, 8], which I assume is the desired output.


Based on the documentation, I cannot really tell what is the point of @private. Coming from CUDA.jl I don’t see why you couldn’t just write local_d = 1. And indeed you can: the CUDABackend() code runs perfectly fine in this manner. But it does seem important when using CPU() as backend.

The issue when using @private local_d = 1 turns out to be in the local_d *= 2 line. Seemingly local_d across threads is represented as an NTuple{256, Int64} (with 256 == @groupsize()) and stuff starts to break down after the (attempted) reassignment. So a MWE for the issue is

julia> @kernel function kern()
           @private var = 1
           var *= 2
       end

julia> kern(CPU(), 1, 1)()
ERROR: MethodError: no method matching setindex!(::Tuple{Int64}, ::Int64, ::Int64)
(...)

This looks like a bug to me.