You can also look at PencilArrays for yet another approach of arrays that can be distributed along one or more dimensions using MPI.
PencilArrays are currently part of the PencilFFTs package, but can be used independently of the parallel FFT functionality. I’m planning to split them into a separate package soon.
Looks also interesting, thanks! Does PencilArrays.jl support broadcasting and calling custom MPI kernels? If so, dou you have an example for the latter?
The PencilArrays is cool
At the lower level, MPITopology uses MPI_Cart_create to define a Cartesian MPI communicator.
I guess it is nothing new really, but you can submit a job to a cluster and use information from the environment variables of the job (number of nodes, number of CPU cores per node etc.) to set up these communicators.
Not really related, but I have set up ‘bladesets’ on HPC clusters where the jobs ‘prefer’ to run on nodes close to each other in the topology.
I’m not sure if broadcasting fully works, but that would be really easy to add if it doesn’t.
As for custom MPI kernels, yes it is possible, and it works pretty much the same way as in a C or Fortran MPI application. Indexing a PencilArray, either with linear or Cartesian indices, always yields a value in the local part of the array.
For some examples you can look at the docs, where different approaches for iterating over arrays are compared.