Scalar indexing using broadcast notation?

I want to convert this to broadcast . notation

        indices = shuffle([(i, j) for i in 1:height for j in 1:width])
    
        for (i, j) in indices  
            img[i, j] = sum(img .- (function.((img .- img[i, j]))))
        end

Is this possible? I want to use the CuArrays functionality, and I am getting a warning from the package that I am using scalar indexing.

Perhaps it’s an unusual operation since it uses a global operation per pixel. Any suggestions would be golden.

Each iteration of the for loop mutates img for the next iteration. There is a lot of intra-loop dependency here. Is this desired? Usually, a new image is used for the destination in such operations.
Additionally, providing code is nice… providing running code is much nicer.

Hi Dan,

Thanks for replying! Yes you are right, it mutates and that’s okay, it executes randomly over the image and it’s okay for this to happen concurrently.

Agree about the data race. One broadcasted implementation is:

tmp = reshape(img, 1, 1, size(img)...)
sum!(img, tmp .- func.(tmp .- img))

from here:

julia> using TensorCast  # my package

julia> @pretty @reduce img[i, j] = sum(k, l) img[k, l] - func(img[k, l] - img[i, j])
begin
    @boundscheck ndims(img) == 2 || throw(ArgumentError("expected a 2-tensor img[i, j]"))
    local starling = transmute(img, Val((nothing, nothing, 1, 2)))
    sum!(img, @__dot__(starling - func(starling - img)))
end

Note that this allocates an intermediate array of size (size(img)..., size(img)...). With CUDA it should be possible to avoid this via lazy broadcasting but not so easy to write out.

Another way is this, which does not use broadcasting:

using Tullio, KernelAbstractions, CUDA
@tullio img[i, j] = img[k, l] - func(img[k, l] - img[i, j])  # implicit sum over k,l
2 Likes

To retain the random ordering of updates:

p = randperm(length(img))
permute!(vec(img), p)
# .... Tullio code on `img` comes here
invpermute!(vec(img), p)

will do the trick, since all the operation depend on either pixel or all image (no neighbourhood effects).