Generic Kernels for CLArrays

I’d like to take the minimum of a scalar field z subject to some mask d. To illustrate, z is a checkerboard pattern and d is a circle centered at c_xy.

N = 10
xy = [(i, j) for i=1:N, j=1:N] # coordinates
z =  [(i+j)%2 + 7 for i=1:N, j=1:N] # checkerboard

# render z masked by a circle
function render(xy, z)
    function d(xy, z)
        c_xy = (5,5)
        r = (xy[1] - c_xy[1])^2 + (xy[2] - c_xy[2])^2
        mask = (sqrt(Float32(r)) < 3.5)
        return mask * z
    end
    return map(d, xy, z)
end

# render z masked by a circle at c_xy
function render_c(xy, z, c_xy)
    function d(xy, z)
        r = (xy[1] - c_xy[1])^2 + (xy[2] - c_xy[2])^2
        mask = (sqrt(Float32(r)) < 3.5)
        return mask * z
    end
    return map(d, xy, z)
end

# get maximum of z within the mask
function query_c(xy, z, c_xy)
    function d(xy, z)
        r = (xy[1] - c_xy[1])^2 + (xy[2] - c_xy[2])^2
        mask = (sqrt(Float32(r)) < 3.5)
        return mask * z
    end
    #return maximum(map(d, xy, z))
    return mapreduce(xyz->d(xyz[1], xyz[2]), max, zip(xy, z))
end

render runs fine on the GPU with GPUArrays interface of CLArrays. 1. I wish there were a way to avoid constructing xy and just get the indices. 2. I don’t know how to get render_c working, because it relies on d closing over the value c_xy. I could construct an array full of identical c_xy values, but I’m hoping to have a better solution. (For example, setting the stride of some array to 0 in all dimensions).

Do I need to write a custom kernel? I succeeded once upon using https://mikeinnes.github.io/2017/08/24/cudanative.html as a guide for CuArrays. Are there examples of generic kernels for CLArrays?

Sure, there are lots of generic kernel in GPUArrays, which work both with CuArrays + CLArrays:
https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/base.jl#L102
https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/linalg.jl#L26

I also wrote this blog post a while ago: