Unexpected behavior of CUDA kernel

Hello all. I hit an unexpected error writing CUDA kernel.
Somehow the variable b seems to be in conflict.
When I removed b=sum(A), this function worked.
Is it a spec or a bug?

using CUDA

function main()
    function kernel(A, B)
        i = threadIdx().x
        b = B[i]
        A[i] = b
        return nothing
    end

    A = CUDA.zeros(1)
    B = CUDA.zeros(1)
    CUDA.@cuda kernel(A, B)

    b = sum(A)
end
main()
ERROR: GPU compilation of MethodInstance for (::var"#kernel#11")(::CuDeviceArray{Float32, 3, 1}, ::CuDeviceArray{Float32, 3, 1}) failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel#11", which is not isbits:
  .b is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Adding the b = sum(A) or anything b = ... makes the b inside the kernel a closed over variable. I think this is what CUDA does not like, if you change it to c = sum(A) it should work.

This illustrates this behavior a bit:

julia> function f()
         function g()
           return b
         end
         b = 1
         return g
       end
         
f (generic function with 1 method)

julia> gg = f(); gg()
1
1 Like

Thanks. I did not know Julia can capture a variable after function definition.

1 Like