GPU sum closure throwing an error

I’m trying to streamline some code using KernelAbstractions and it’s almost working perfectly. I’m getting hung up when I try to write a sum closure like this within the kernel

using KernelAbstractions
using CUDA
u = CUDA.rand(Float32,5,5,2)
div = CuArray{Float32}(undef,4,4)
δ(j) = (j==1) ? CartesianIndex((1,0)) : CartesianIndex((0,1))
@kernel function k_divergence(div, u)
    I = @index(Global, Cartesian)
    div[I] = sum(u[I+δ(j),j]-u[I,j] for j in eachindex(axes(div)))
end
divergence!(div,u) = k_divergence(CUDABackend(), 64)(div, u, ndrange=size(div))
divergence!(div,u)

This throws an InvalidIRError “Reason: unsupported dynamic function invocation (call to mapreduce_empty)”. I tried use Cthulhu to track down the error but everything looks “blue” other than

(::Core.var"#Any##kw")(::Any, obj::KernelAbstractions.Kernel{CUDABackend}, args...) in CUDA.CUDAKernels at C:\Users\gweymouth\.julia\packages\CUDA\N71Iw\src\CUDAKernels.jl:102
┌ Warning: couldn't retrieve source of (::Core.var"#Any##kw")(::Any, obj::KernelAbstractions.Kernel{CUDABackend}, args...) in CUDA.CUDAKernels at C:\Users\gweymouth\.julia\packages\CUDA\N71Iw\src\CUDAKernels.jl:102
└ @ TypedSyntax C:\Users\gweymouth\.julia\packages\TypedSyntax\HNHOE\src\node.jl:31

If I expand out the sum into a for loop it works fine, but I am sure a base function like sum is supported, so I’m just doing something wrong.

1 Like

This kernel works

@kernel function k_divergence(div, u)
    I = @index(Global, Cartesian)
    div[I] = sum(j -> u[I+δ(j),j]-u[I,j], eachindex(axes(div)), init=zero(eltype(div)))
end

Does KA not support the other form of sum?

KA.jl is not involved in the compilation of sum. You’re just expecting this sum invocation to get compiled (by Julia & GPUCompiler) down to something that’s GPU compatible, which is generally not something you can rely on. Kernels are meant to be used with scalar code, not vectorized functions.

2 Likes

Ok - that’s good to know.