Current array index inside broadcast?

I’m broadcasting over 2d arrays to use GPUArrays if they are passed in.

I often need the current index for accessing other arrays and relative locations. Any ideas on the easiest, lowest overhead way to do this?

Here is my first attempt, allowing access to any index of the source array:

rows = CuArray(collect(row for row in 1:h, col in 1:w))
cols = CuArray(collect(col for row in 1:h, col in 1:w))

...

broadcast!(f, dest, source, rows, cols, (source,))

It works fine, but is a lot of allocation if you have do it regularly.