Is the CPU parallelism in the ParallelStencil.jl Concise single/multi-XPU miniapps limited to threads, or since it’s built on/with ImplicitGlobalGrid can it do MPI as well? I don’t see anything that looks like MPI support but I’m not an expert. If the code currently only does threads how difficult would adding MPI be?
ImplicitGlobalGrid uses MPI for domain-decomposition with halo exchange. The global domain is split up behind the scenes with the init_global_grid()
call. If you dig around in the ImplicitGlobalGrid source code you’ll see where they call MPI.
If you use ParallelStencil in combination with ImplicitGlobalGrid, then you will be able launch your application on multiple processes (CPU/GPU) - ImplicitGlobalGrid relies on MPI for inter-process communication. In the github readme we have written an overview of ImplicitGlobalGrid, which should answer all your initial questions:
Function documentation is callable from the REPL:
Furthermore, my talk at JuliaCon 2020 gives an introduction to ParallelStencil and ImplicitGlobalGrid:
Finally, our last year’s workshop on “Solving differential equations in parallel on GPUs | Workshop | 2021” also discusses the usage of ParallelStencil with ImplicitGlobalGrid (I think towards the end):
Do not hesitate to ask if something remains unclear…