CartesianIndices, Convolution and other "moving window" computations on GPU

Hey, I want to move some code to the GPU that’s heavy on image processing (using convolution & similar stuff)
On the CPU I was doing it manually using a for loop over CartesianIndices. So I’m curious whether that’s the right for for the GPU aswell or if there are other solutions for that scenario:
I’ve got an array (2d single color channel or 3d with multiple color channels) and now I want to create a new image by computing every pixel from a given n*n surrounding in the orginal image. This can be similar as doing a linear combination or as complex as counting the values around the pixel and returning the value that showed up most often.