[ANN] MPIHaloArrays.jl

MPIHaloArrays.jl is a new package that provides the MPIHaloArray array type, which is a subtype of AbstractArray, that facilitates the exchange of halo, or ghost, cells commonly used in domain-decomposition methods high performance computing problems. In my case, I use this method to solve the compressible Euler equations for fluid dynamics in parallel. While there are other existing libraries that have similar decomposition functionality, such as MPIArrays.jl, PencilArrays.jl, and ImplicitGlobalGrid.jl, I wanted to create an array type specifically for this task. Each library only provides only a portion of the functionality I needed, so I decided to try creating a new one! I’ve done this type of parallelization in Fortran 2018 with Coarrays, but have recently written a multi-threaded, multi-physics code in Julia (compressible-hydro + thermal conduction) for my dissertation. MPIHaloArray's will form the basis for the transition to domain decomposed parallelization.

Domain-decomposition splits a large problem into small subdomains that live on different MPI ranks. Stencil operations commonly used in solving PDEs require neighbor information, and this is done with halo cells along the edge of each subdomain. In the image below, the domain is decomposed into 4 subdomains, with a single layer of halo cells on all sides (process 4 is shown exchanging orthogonal neighbor information with processes 3 and 2). Domain decomposition can happen in 1D, 2D, and 3D.


  • Provides the MPIHaloArray array type
  • Domain decomposition can happen in1D, 2D, or 3D
  • Neighbor exchange with arbitrarily sized halo cell regions. Note, the number of halo cells is currently fixed across all dimensions
  • Communication is currently handled by MPI.ISend and MPI.IRecv underneath, but future versions will give the option for one-sided communication with MPI.Put and MPI.Get
  • Halo exchange can either be orthogonal-only (e.g. [i+1,j,k]) , or include corner information as well (e.g [i+1,j+1,k])


  • GPU testing – I don’t currently have a local GPU to test this with, but future work will include GPU-aware MPI
  • Get rid of the limitation of 1D, 2D, or 3D arrays. For example, an array U could have dimensions for [q,i,j], but halo exchanges are only done on i and j.
  • Optimization of neighbor communication (caching, etc…)
  • Finish one-sided implementation



using MPI, MPIHaloArrays

const comm = MPI.COMM_WORLD
const rank = MPI.Comm_rank(comm)
const nprocs = MPI.Comm_size(comm)
const root = 1

# Create a topology type the facilitates neighbor awareness and global size
topology = CartesianTopology(comm, 
                             [4,2], # domain decompose in a 4x2 grid of ranks
                             [false, false]) # no periodic boundaries

nhalo = 2

# Create some data on the local rank
local_data = rand(10,20) * rank
A = MPIHaloArray(local_data, topology, nhalo)
fillhalo!(A, -1)

# Perform halo exchange

# Or start with a large global array
B_global = rand(512,512)

# divide by rank and return B::MPIHaloArray
B_local = scatterglobal(B_global, root, nhalo, topology; 
           do_corners = false) # if your algorithm doesn't need 
                                # corner info, this will save communication
# do some work
# ....
# do some work
# ....

# and pull it back to the root rank
B_result = gatherglobal(B_local; root=root)


2D Diffusion

See in the example here in the github repository.


See the docs here

I’d love to hear your feedback or contributions!