I’m happy to announce PencilFFTs.jl, a Julia package for performing parallel fast Fourier transforms using MPI.
This package is an attempt to bring parallel FFTs to Julia. It is well adapted for running on thousands of cores in CPU computing clusters (support for GPUs may arrive in the future!). The package uses a similar strategy as “standard” C/C++/Fortran codes that are commonly used in research. Namely, it decomposes a multidimensional array along one or more dimensions, so that each MPI process handles one of the resulting portions of the array. This is illustrated by the image below, which shows the 2D decomposition of a 3D array over 12 MPI processes. This particular configuration is sometimes called “pencil” decomposition, hence the name of this package.
Given a decomposition, each process can perform serial FFTs along the non-decomposed dimensions. Then, to transform along the remaining dimensions, one has to redistribute (or transpose) data among MPI processes as shown in the image. The method is described in a bit more detail in the package documentation.
This kind of approach is commonly used in spectral methods that may use Fourier or Chebyshev expansions to decompose the solution of the governing equations. Applications in physics include solvers for the Navier–Stokes equations to describe fluid turbulence in periodic or wall-bounded domains, and for the Gross–Pitaevskii equation describing Bose–Einstein condensates. See also this very relevant discussion here on Discourse from a few years ago.
As shown in the benchmarks, the performance and scalability of PencilFFTs are close to those displayed by the P3DFFT library written in Fortran, which is probably the most popular library providing FFTs with pencil decomposition. Further performance improvements are possible and will likely arrive in the near future.
The implementation of PencilFFTs is generic in the sense that it can work efficiently with geometries of arbitrary dimension. Moreover, one can choose among all transforms supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms. It’s also possible to combine different transforms along each dimension.
The parallel FFT functionality is implemented on top of a Pencils
module that can also be used by itself, e.g. if one only wants to conveniently deal with MPI-distributed arrays without needing FFTs. I’m considering making this module a separate package. Further functionality that I’d like to associate to this module include exchange of ghost cells between neighbouring MPI processes, and parallel I/O using HDF5.jl.
More details are available in the documentation, including a tutorial and a few more examples.