Scalar indexing using broadcast notation?

nihil · December 12, 2023, 12:03am

I want to convert this to broadcast . notation

        indices = shuffle([(i, j) for i in 1:height for j in 1:width])
    
        for (i, j) in indices  
            img[i, j] = sum(img .- (function.((img .- img[i, j]))))
        end

Is this possible? I want to use the CuArrays functionality, and I am getting a warning from the package that I am using scalar indexing.

Perhaps it’s an unusual operation since it uses a global operation per pixel. Any suggestions would be golden.

Dan · December 12, 2023, 1:42am

Each iteration of the for loop mutates img for the next iteration. There is a lot of intra-loop dependency here. Is this desired? Usually, a new image is used for the destination in such operations.
Additionally, providing code is nice… providing running code is much nicer.

nihil · December 12, 2023, 1:50am

Hi Dan,

Thanks for replying! Yes you are right, it mutates and that’s okay, it executes randomly over the image and it’s okay for this to happen concurrently.

mcabbott · December 12, 2023, 2:09am

Agree about the data race. One broadcasted implementation is:

tmp = reshape(img, 1, 1, size(img)...)
sum!(img, tmp .- func.(tmp .- img))

from here:

julia> using TensorCast  # my package

julia> @pretty @reduce img[i, j] = sum(k, l) img[k, l] - func(img[k, l] - img[i, j])
begin
    @boundscheck ndims(img) == 2 || throw(ArgumentError("expected a 2-tensor img[i, j]"))
    local starling = transmute(img, Val((nothing, nothing, 1, 2)))
    sum!(img, @__dot__(starling - func(starling - img)))
end

Note that this allocates an intermediate array of size (size(img)..., size(img)...). With CUDA it should be possible to avoid this via lazy broadcasting but not so easy to write out.

Another way is this, which does not use broadcasting:

using Tullio, KernelAbstractions, CUDA
@tullio img[i, j] = img[k, l] - func(img[k, l] - img[i, j])  # implicit sum over k,l

Dan · December 12, 2023, 2:39am

To retain the random ordering of updates:

p = randperm(length(img))
permute!(vec(img), p)
# .... Tullio code on `img` comes here
invpermute!(vec(img), p)

will do the trick, since all the operation depend on either pixel or all image (no neighbourhood effects).

Topic		Replies	Views
How to "reduce" an array? General Usage cuda , arrays	19	1270	January 14, 2023
CUDA on slices General Usage cuda , broadcasting , cuarrays , tensors , mapslices	5	1365	September 6, 2022
TensorCast & CUDA General Usage question , cuda	2	563	June 16, 2020
How to perform explicit array broadcasting like np.broadcast_to/torch.expand in Julia? General Usage broadcast , array , tullio , views	11	1955	August 3, 2021
How to vectorize any function on the GPU with CUDA.jl? GPU question , function	3	452	March 14, 2024

Scalar indexing using broadcast notation?

Related topics