If you use ParallelStencil in combination with ImplicitGlobalGrid, then you will be able launch your application on multiple processes (CPU/GPU) - ImplicitGlobalGrid relies on MPI for inter-process communication. In the github readme we have written an overview of ImplicitGlobalGrid, which should answer all your initial questions:
Function documentation is callable from the REPL:
Furthermore, my talk at JuliaCon 2020 gives an introduction to ParallelStencil and ImplicitGlobalGrid:
Finally, our last year’s workshop on “Solving differential equations in parallel on GPUs | Workshop | 2021” also discusses the usage of ParallelStencil with ImplicitGlobalGrid (I think towards the end):
Do not hesitate to ask if something remains unclear…