The most common way to do this in parallel computing is not by shifting the array, but rather by using “ghost cells” — make redundant copies of boundary data in a communication step, after which point the operations on each “chunk” of the array can occur in parallel. See e.g. Principles of Parallel Programming by Lin and Snyder, quoted here: Seemingly unnecessary allocation within for loop - #9 by ptoche and the GitHub - fverdugo/PartitionedArrays.jl: Vectors and sparse matrices partitioned into pieces for parallel distributed-memory computations. package (though this is not GPU-based).
1 Like